Análise de Similaridade de Sequências Genômicas

Detalhes bibliográficos
Ano de defesa: 2013
Autor(a) principal: Fonseca, ítallo Costa
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal da Paraí­ba
BR
Física
Programa de Pós-Graduação em Física
UFPB
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpb.br/jspui/handle/tede/5738
Resumo: In this thesis, we investigate aspects of similarity between sequences of complete mitochondrial DNA. This line of study falls within the framework from the analysis of statistical properties of DNA sequences based on methods that seek to understand the information contained in these sequences a topic of renewed interest in the context of the so called Complex Systems. Previous approaches were used to obtain the frequencies of certain segments of nucleotides, regarded as the words of a given size, contained in sequences. These methods, inspired by studies devoted to the statistical properties of words distribution in linguistic and symbolical sequences, can be considered an alternative to techniques and algorithms for aligning sequences, and have been successful in the description of characteristics that allow to infer similarity and possible species grouping criteria, it means, biological afnity between DNA sequences. Previously, this methodology has been applied to evaluate the diferences between coding and nocoding DNA sequences and to extract linguistic aspects of these sequences by detecting keywords that describe relevant information embedded in the threads. In this dissertation, these studies are expanded in order to directly compare the contents of pairs of complete sequences of mitochondrial DNA, setting parameters that depend on the frequency distribution of sequences of words which highlight both the relevance of certain words as well as the possibility of grouping species estimating the distance between these words. Our results show that the best clusters between diferent species are obtained when we calculate the rate of agglomeration considering only frequencies of words. We have also observed that the larger the word size is, its greater clustering between sequences. The prospect of applying our results to analyze DNA sequences also belong to a single biological species, may be relevant in the construction of phylogenetic trees that are appropriate structures for understanding the evolutionary history of organisms.