Detalhes bibliográficos
Ano de defesa: |
2018 |
Autor(a) principal: |
RIZO RODRÍGUEZ, Sara Inés |
Orientador(a): |
CARVALHO, Francisco de Assis Tenório de |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Universidade Federal de Pernambuco
|
Programa de Pós-Graduação: |
Programa de Pos Graduacao em Ciencia da Computacao
|
Departamento: |
Não Informado pela instituição
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Link de acesso: |
https://repositorio.ufpe.br/handle/123456789/30523
|
Resumo: |
Data Clustering is one of the most important issues in data mining and machine learning. Clustering is a task of discovering homogeneous groups of the studied objects. Recently, many researchers have a significant interest in developing clustering algorithms. The most problem in clustering is that we do not have prior information knowledge about the given dataset. The traditional clustering approaches are designed for searching clusters in the entire space. However, in high-dimensional real world datasets, there are usually many irrelevant dimensions for clustering, where the traditional clustering methods work often improperly. Subspace clustering is an extension of traditional clustering that enables finding subspace clusters only in relevant dimensions within a data set. However, most subspace clustering methods usually suffer from the issue that their complicated parameter settings are almost troublesome to be determined, and therefore it can be difficult to implement these methods in practical applications. This work proposes a partitioning fuzzy clustering algorithm with entropy regularization and automatic variable selection through adaptive distance where the dissimilarity measure is obtained as the sum of the Euclidean distance between objects and prototypes calculated individually for each variable. The main advantage of the proposed approach to conventional clustering methods is the possibility of using adaptive distances, which change with each iteration of the algorithm. This type of dissimilarity measure is adequate to learn the weights of the variables dynamically during the clustering process, leading to an improvement of the performance of the algorithms. Another advantage of the proposed approach is the use of the entropy regularization term that serves as a regulating factor during the minimization process. The proposed method is an iterative three-step algorithm that provides a fuzzy partition, a representative for each fuzzy cluster. For this, an objective function that includes a multidimensional distance function as a measure of dissimilarity and entropy as the regularization term is minimized. Experiments on simulated, real world and image data corroborate the usefulness of the proposed algorithm. |