Avaliação do impacto da seleção de partições base em ensemble multiobjetivo
Ano de defesa: | 2018 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de São Carlos
Câmpus Sorocaba |
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Ciência da Computação - PPGCC-So
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Palavras-chave em Inglês: | |
Área do conhecimento CNPq: | |
Link de acesso: | https://repositorio.ufscar.br/handle/20.500.14289/9926 |
Resumo: | Unsupervised data clustering is not a trivial process, as no previous knowledge is available and real data is often complex and multi-faceted. To make matters worse, traditionally, clustering aims to describe the data being explored under a single perspective. However, it is broadly known that in several cases this approach imposes serious limitations on what could be extracted with the analysis. Furthermore, changes in parameters and preprocessing techniques can dramatically change the final result, either by evidencing or by hiding a possible plural meaning presented in the data. To tackle some of these issues, recent efforts that build knowledge considering multiple partitions as base, such as ensemble clustering, emerged. However, special care must be taken in the composition of those partitions, as their quality and diversity proved to be closely related to their performances. To enhance the quality and diversity of those multiple partitions — and provide better results —, a number of methods to evaluate and select a subset of the partitions have been proposed and successfully applied. In this work, we expand this discussion by evaluating the impact of some of the state-of-the-art selection methods in the novel context of multi-objective cluster ensemble. In this novel context, our analysis show improvements in two important issues: (i) the results are more concise, which facilitates posterior manual analysis, and (ii) are obtained with less computational effort. All of that without affecting the quality of the results. |