Sistema automático para caracterização de rnas não-codificantes
Ano de defesa: | 2023 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Tecnológica Federal do Paraná
Cornelio Procopio Brasil Programa de Pós-Graduação em Bioinformática UTFPR |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://repositorio.utfpr.edu.br/jspui/handle/1/33100 |
Resumo: | Non-coding RNAs (ncRNA) are RNAs that can be transcribed, but not translated into proteins. Although their functions are not fully known, ncRNAs have many biological functions, generally focusing on regulatory or interactional processes, such as chromatin alterations, transcriptional regulation, nuclear organization, translation, etc. Two key ways to identify ncRNAs are by sequence similarity analysis (alignment), which can be done with the BLAST tool, or structural search, by using the INFERNAL tool. However, the post-results data analysis among both tools output is still a gap. In this context, there are two major tools (StructRNAfinder and FindNonCoding) that have been developed to facilitate the ncRNA annotation. However, they do not cover all the main strategies for ncRNA identification. To fill this gap, we developed an automatic and scalable system for large-scale data annotation analysis of ncRNAs which use both sequence and structural search strategy for ncRNA annotation. Our tool uses the most updated version of INFERNAL together with RFAM and BLAST along with RNAcentral databases to perform the ncRNA identification, and bring the output in user-friendly reports, files and statistics for the final user. To validate the tool, we present a benchmark with two other tools that aims to facilitate the annotation of ncRNAs (StructRNAfinder and FindNonCoding), and tested in public genomes from RefSeq, Ensembl Plants and GENCODE. The dataset for the test contained seven nuclear genomes available in public databases, which were Chlamydia trachomatis, Drosophila melanogaster, Escherichia coli and Saccharomyces cerevisiae from RefSeq; Homo sapiens from Gencode; Arabidopsis thaliana, Oryza sativa and Zea mays from Ensembl Plants. Our tool presents better sensitivity and accuracy when compared to other tools, which may indicate that our method presents better results for the annotation of ncRNAs. |