Solução de integração e avaliação de softwares de anotação genômica em coffea spp

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Cantelli, Geraldo Cesar
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Tecnológica Federal do Paraná
Cornelio Procopio
Brasil
Programa de Pós-Graduação em Bioinformática
UTFPR
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.utfpr.edu.br/jspui/handle/1/5420
Resumo: One of the biggest challenges of bioinformatics is the analysis of complete genomes, for instance the identification of computationally predicted genes and its association to respective biological functions. Therefore, it is important to design experiments that can test these predictions and compare them with existing ones so that you can measure their performance. With a growing volume of genomic and transcriptomic available data, efficient and affordable pipelines to perform a good gene annotation process are needed. How to improve the correct genome annotation avoiding over or under prediction to obtain more accurary? In this work we study which characteristic is more interesting to a genomic annotation software comparing two software, PASA and MAKER, analyzing the genome of Coffea canephora, C. eugenioides and C. arabica. We also executed a quality improvement in these Coffea genome annotation and peformed statistical comparison between these two software. Besides it is proposed an automated tool which allows to repeat some of the analyses performed in this work. Results show the effectiveness of using detection of all alternative splicing possibilities in the algorithm of annotation due to PASA finding more exclusive genes (compared with MAKER) and located genes equally in different regions of the chromosomes, which is difficult for many gene predictors. New versions of the annotation of the genomes of C. arabica, C. canephora and C. eugenioides were generated to be made available for use by the scientific community. The Ensemble Solution program was developed to make possible evaluation of genomic annotation software, GFF3 files, lists of genes exclusively and Venn diagrams, to import GenBank properties and generate more complete reports.