Utilizando agrupamento com restrições e agrupamento espectral para integração de dados de enzimas

Detalhes bibliográficos
Ano de defesa: 2011
Autor(a) principal: Elisa Boari de Lima
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/SLSS-8GQGQC
Resumo: When multiple data sources are available for data mining, an a priori data integration process is usually required. This process may be costly and not lead to good results, since important information is likely to be discarded. In this master's thesis, we propose constrained clustering and spectral clustering as strategies for integrating data sources without losing any information. The process basically consists of adding the complementary data sources as constraints that the clustering algorithms must satisfy, or using them to increase the similarity between pairs of objects for the spectral clustering algorithms.As a concrete application of our approach, we focus on the problem of enzyme function prediction, which is a hard task usually performed by intensive experimental work. We use constrained and spectral clustering as means of integrating information from diverse sources, and analyze how this additional information impacts clustering quality in an enzyme clustering application scenario. Our results show that the use of such additional information generally improves the clustering quality when compared to the results using only the main database.Keywords: constrained clustering, data integration, enzyme clustering, spectral clustering.