Detalhes bibliográficos
Ano de defesa: |
2018 |
Autor(a) principal: |
Reinoso Vilca, Fabio Ivan |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Universidade Federal de Viçosa
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
http://www.locus.ufv.br/handle/123456789/23926
|
Resumo: |
Bacterial small RNAs (sRNAs) are usually non-coding RNAs (ncRNAs) with a size of 50–500 nucleotides, and act mainly as post-transcriptional regulators. Prediction of sRNAs is a challenging issue in bioinformatics. The current computational tools deliver a high number of false positives. Hence, the development of more precise predictive methods is of fundamental importance to narrow the number of costly and time-consuming sequence validations on the laboratory workbench. In this work, we collected a series of features from the existent computational tools for ncRNA prediction in order to select the best ones for classifying putative bacterial sRNA sequences. Out of the 264 initially-chosen features, 22 relevant and non-redundant features could be selected by using feature-selection algorithms. To validate this proposal we used a dataset built with only experimentally-validated sRNAs from different bacteria sub-strains, considered as model organisms in genetics, as well as non-sRNA sequences. Finally, a Random Forest algorithm was applied for the classification task. Our first validation experiment of this proposal covered the single sequence prediction task, using 6 testing sets. Our pipeline presented better results than the only ab initio method we could find in literature. The differentiating characteristics of our method are the lower computational cost, the dimensionality reduction and the analytic power analysis due to the single 22 features selected. Our approach could reach an average of 80% of Accuracy, 71.28% of Precision, 82.11% of Specificity and an area under the ROC curve of 0.879. Furthermore, we presented a Genome-wide framework to sRNA prediction, obtaining a 39% lower False Positive Ratio and the double of Specificity than the above-mentioned ab initio method. |