Codificação de Seqüências deAminoácidos e sua Aplicação naClassificação de Proteínas com RedesNeurais Artificiais
Ano de defesa: | 2007 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Minas Gerais
UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/GRFO-7JLQW3 |
Resumo: | This work aims to develop a protein coding system in which sequences withdifferent numbers of amino acids can be converted in vectors with the samedimension to be functionally classified by Artificial Neural Networks.The proposed scheme uses sliding windows with previous defined length. Thesliding windows run over the sequence, and results in a vector containing informa-tionabout the sequence. The coding method must result in unambiguous vectors,must consider the similarity between amino acids and must consider small regionswith similarity in which the sliding windows must have a relevancy proporcional totheir length.In this word we presented a study of similarity and dissimilarity measure bet-weenamino acid sequences, where the pair-to-pair alignment is the metric morefrequently used. Some problems using the pair-to-pair alignment to measure dis-similarityis shown , where other metrics became more effective. In other to usethese metrics it is necessary a coding scheme called Sequence Coding by SlidingWindow, which generates vectors with the same dimension. This coding schemewas used to classify amino acid sequences using Artificial Neural Networks.We present a comparison between both coding schemes, in which amino acidssequences from proteins of 10 bacteria were coded and used to train Artificial NeuralNetworks to classify these sequences according to the Cluster of Orthologous Groups(COG). Two groups of sequences derived from proteins of Chromobacterium vio-laceumand Chlamydophila felis were selected in other to test our method.The comparison shows the superiority of the proposed coding scheme in whichthe information stored in the resulting vectors allows the Artificial Neural Networksto classify the two sets of proteins according the COG functional classes.All sequences that were classified in a different way by the Artificial NeuralNetworks, had its classification verified by CD-Search alignment against the COGdata base. The results showed that some sequences are classified incoherentlyin the public data bases. The Artificial Neural Networks trained with the vectorsgenerated by the E-SCSW scheme were able to classify correctly 184 sequencesderived from Chromobacterium violaceum and 94 from Chlamydophila felis.This work has the main contribution of developing a new protein coding methodin which Artificial Neural Networks are used. The verification of the results showedthat the public repositories contain some inconsistencies and that the amino acidsequences deposited should be verified in a frequent basis. The proposed codifica-tionmethod can thus be used as a complement to the traditional protein classifi-cationmethods which are based in a par-to-par alignment. |