Solução de integração e avaliação de softwares de anotação genômica em coffea spp
Ano de defesa: | 2020 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Tecnológica Federal do Paraná
Cornelio Procopio Brasil Programa de Pós-Graduação em Bioinformática UTFPR |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://repositorio.utfpr.edu.br/jspui/handle/1/5420 |
Resumo: | One of the biggest challenges of bioinformatics is the analysis of complete genomes, for instance the identification of computationally predicted genes and its association to respective biological functions. Therefore, it is important to design experiments that can test these predictions and compare them with existing ones so that you can measure their performance. With a growing volume of genomic and transcriptomic available data, efficient and affordable pipelines to perform a good gene annotation process are needed. How to improve the correct genome annotation avoiding over or under prediction to obtain more accurary? In this work we study which characteristic is more interesting to a genomic annotation software comparing two software, PASA and MAKER, analyzing the genome of Coffea canephora, C. eugenioides and C. arabica. We also executed a quality improvement in these Coffea genome annotation and peformed statistical comparison between these two software. Besides it is proposed an automated tool which allows to repeat some of the analyses performed in this work. Results show the effectiveness of using detection of all alternative splicing possibilities in the algorithm of annotation due to PASA finding more exclusive genes (compared with MAKER) and located genes equally in different regions of the chromosomes, which is difficult for many gene predictors. New versions of the annotation of the genomes of C. arabica, C. canephora and C. eugenioides were generated to be made available for use by the scientific community. The Ensemble Solution program was developed to make possible evaluation of genomic annotation software, GFF3 files, lists of genes exclusively and Venn diagrams, to import GenBank properties and generate more complete reports. |