Detalhes bibliográficos
Ano de defesa: |
2020 |
Autor(a) principal: |
Silva, José Cleydson Ferreira da |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Universidade Federal de Viçosa
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
https://locus.ufv.br//handle/123456789/28726
|
Resumo: |
Machine learning (ML) is a field of artificial intelligence that has rapidly emerged in plant molecular biology, thus allowing the exploitation of massive data. The main challenges are to analyze massive datasets and extract new knowledge of cellular systems. Here, we just presented a systematic review to disentangle ML approaches is relevant for plant scientists (Chapter 1). We presented the main steps for ML development, including data selection, features extraction, training algorithms and evaluation of classification/prediction models, indicating role ML algorithm in the post-genomic era. Additionally, based on the systematic review we also developed a framework machine learning method for cell surface receptors prediction (Chapter 2). Two classes of cell surface receptors designated receptor-like protein kinase (RLK) and receptor-like protein (RLPs) are essential for perceiving and processing external and internal signals in plants and animal. Both are involved in plant development and pathogen responses and share a similar extracellular domain, capable of initial sensing environmental signal. However, RLPs have short divergent C-terminal regions not associated with conserved kinase domain characteristic of RLKs. The absence of C-terminal phylogenetic relationships between RLK and RLPs precludes the use of sequence comparison algorithms for high-throughput predictions of the RLP family. Thus, we developed the first RLP predictor in plants designated RLPredictiOme. The RLPredictiOme was implemented based on machine learning models associated with Bayesian inference. The ML models were developed in three stages to distinguish RLPs from noRLPs, RLPs from RLKs and classify new subfamilies of RLPs in plants. The evaluation of the models resulted in a high accuracy, precision, sensitivity, and specificity and relatively high probability ranging from 0.79 to 0.99 for RLPs predictions. In addition, a complete validate the of RLPredictiOme was performed with LRR-RLPs of previously characterized Arabidopsis RLPs, Arabidopsis and rice and more than 90% of known RLPs were correctly predicted. In addition to predicting previously characterized RLPs, RLPredictiOme uncovered new RLP subfamilies in the Arabidopsis genome. These include a probable lipid transfer (PLT)-RLP, plastocyanin-like-RLP, ring finger-RLP, glycosyl-hydrolase-RLP, and glycerophosphoryl diester phosphodiesterase (GDPDL)-RLP subfamilies, yet to be characterized. In comparison with the only Arabidopsis GDPDL-RLK, molecular evolution studies confirmed that the ectodomain of GDPDL-RLPs from Arabidopsis might have undergone purifying selection with a predominance of synonymous substitutions. Expression analyses revealed that predicted GDPGL-RLPs display a basal level of expression and respond to developmental and biotic signals. The results of these biological assays substantiate the notion that the members of this subfamily have maintained functional domains during evolution and may play relevant roles in development and plant defense. Therefore, RLPredictiOme can provide new insights into the functional role of surface receptors and their relationships with different biological processes. Keywords: Machine learning. Receptor-like protein. RLPredictiOme. |