Detalhes bibliográficos
Ano de defesa: |
2014 |
Autor(a) principal: |
Ferrari, Daniel Gomes
 |
Orientador(a): |
Silva, Leandro Nunes de Castro
 |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Universidade Presbiteriana Mackenzie
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://dspace.mackenzie.br/handle/10899/24259
|
Resumo: |
Data clustering is an important data mining task that aims to segment a database into groups of objects based on their similarity or dissimilarity. Due to the unsupervised nature of clustering, the search for a good quality solution can become a complex process. There is currently a wide range of clustering algorithms and selecting the most suitable one for a given problem can be a slow and costly process. In 1976, Rice formulated the algorithm selection problem (PSA) postulating that a good performance algorithm can be chosen according to the problem s structural characteristics. Meta-learning brings the concept of learning about learning, that is, the meta-knowledge obtained from the algorithms learning process allows it to improve its performance. Meta-learning has a major intersection with data mining in classification problems, where it is used to select algorithms. This thesis proposes an approach to the algorithm selection problem by using meta-learning techniques for clustering. The characterization of 84 problems is performed by a classical approach, based on the problems, and a new proposal based on the similarity among the objects. Ten internal indices are used to provide different performance assessments of seven algorithms, where the combination of the indices determine the ranking for the algorithms. Several analyzes are performed in order to assess the quality of the obtained meta-knowledge in facilitating the mapping between the problem s features and the performance of the algorithms. The results show that the new characterization approach and method to combine the indices provide a good quality algorithm selection mechanism for data clustering problems. |