Modelagem e decomposição de redes de cCoevolução de aminoácidos: aplicações em determinação de especificidade e anotação de proteínas
Ano de defesa: | 2020 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Minas Gerais
Brasil ICB - INSTITUTO DE CIÊNCIAS BIOLOGICAS Programa de Pós-Graduação em Bioinformatica UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/50711 |
Resumo: | Computational molecular evolution analyses are usually performed by using multiple sequences alignments of homologous sequences, in which sequences likely originated from a common ancestors are aligned in a such way that equivalent amino acids are set in the same column. Conserved residues in a multiple sequence alignment can be extremely enlightening by suggesting positions under evolutionary selection and constraint. Most of the methods proposed to coevolution and specificity determinant sites are focused in finding positions, therefore they may ignore sites that are specific for a subfamily but variable in the whole alignment; or requires prior knowledge about the subject families, such as list of subfamilies or phylogenetic trees. This project presents a network-based methodology, commonly apllied to social and ecological systems, with the goal to identify clusters of functionally related residues. The method was first validated using artificial datasets and then applied to four real protein families: C-type Lysozyme/Alpha-lactoalbumin, HIUase/Transthyretin, Amidases and the class A G protein-coupled receptors. Patterns of specificity determinant sets for many functional subclasses were successfully extracted from all these families. These networks were then used as features for a support vector machine (SVM) that was able to correctly classify even subfamilies without detected specificty determinant residues. This machine was also applied to the orphan GPCRs generating novel hypothesis about these proteins. We developed a web application with the aim of promote and facilitate the studies performed by the methodology proposed in the project, this system is able to generate a series of data visualization and cross-references with external archives. Finally, we created a database for specificity determinant sites including precalculated analysis with datasets extracted from Pfam. This database, despite generating many intuitional and dynamic reports, it also has a REST API allowing programmatically access to its data. |