Exploration of non-coding RNA in complex microbial communities with machine learning

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Santos, Anderson Paulo Avila
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-19082024-091042/
Resumo: With the rise of high-throughput technologies, a new era in the analysis of complex biological systems has been ushered in, employing advanced machine learning techniques to unravel the intricate interactions within systems biology. This research focuses specifically on the analysis of metagenomic data and the classification of non-coding RNA (ncRNA), crucial elements for understanding regulatory mechanisms in various biological processes. Two specialized metadata databases, MarineMetagenomeDB and AnimalAssociatedMetagenomeDB, have been developed to enhance research in microbial biogeography and microbial functions associated with animals, addressing the challenges often encountered due to poorly annotated metadata in public repositories. These platforms facilitate the identification and utilization of metagenomic datasets more effectively and accurately. A central aspect of the study is the implementation of BioDeepFuse, an innovation in the field of machine learning that integrates deep learning techniques and feature extraction to enhance the classification of ncRNA sequences. This method has shown significant improvements in the accuracy of ncRNA classification, validated against benchmark datasets and real-world samples, reinforcing its utility in practical applications. The research also emphasizes the importance of integrative approaches to multi-omics analyses, combining extensive datasets with sophisticated machine learning techniques. This approach allows for a deeper understanding of complex microbial communities and promotes the adoption of practices that ensure data accessibility and reusability, aligned with the FAIR principles. Finally, we have a web platform that offers access to a vast collection of predicted ncRNA sequences, facilitating the process of analyzing nucleic acid sequences. This tool helps overcome traditional barriers in computational research, promoting a more connected and inclusive research environment.