Clustering de amostras de dados de expressão gênica utilizando duas métricas de similaridade biologicamente inspiradas

Detalhes bibliográficos
Ano de defesa: 2008
Autor(a) principal: Saulo Augusto de Paula Pinto
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
Brasil
ICB - DEPARTAMENTO DE BIOQUÍMICA E IMUNOLOGIA
Programa de Pós-Graduação em Bioinformatica
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/55791
Resumo: The clustering algorithms are among the most utilized techniques in gene expression data analy-sis. Being an exploratory technique, clustering allows researchers to find out similar expression patterns among the variety of sampled tissues pointing out which sampled conditions are more similar than others. This work presents two methodologies to compute the similarity among whole samples of gene expression data utilizing only a fraction of the most expressed sequences (MESs) in each sample. Both similarity metrics are computed considering the expression ordering of the various sequences present in the samples. One of them privileges the sharing of the most expressed sequences (named MESs similarity). The other privileges the keeping of the expression ordering of the sequences (named MESs ordering conservation). Hierarchical clustering utilizing the proposed similarity metrics was applied in 18 gene expression data series summing up 612 samples and the results compared to those produced by some traditional metrics like Euclidian distance, Pearson, and Spearman correlations. Overall, the use of the two proposed metrics out-performed the others: the MESs similarity showed 89% accuracy and the MESs ordering conserva-tion 80% whereas the best traditional metric for the same data was Pearson correlation that yielded 76% accuracy. The results presented here indicate that the proposed metrics are an alter-native to the traditional ones. Besides, they produce data that reflect biologically significant fea-tures of the sampled systems.