Improving the efficiency of k-medoids algorithms using metric access methods

Teixeira, Larissa Roberta

Improving the efficiency of k-medoids algorithms using metric access methods

Detalhes bibliográficos
Ano de defesa:	2024
Autor(a) principal:	Teixeira, Larissa Roberta
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Agrupamento Clustering Dados dimensionais Dimensional data Indexação Indexing k-medoids Métodos de acesso métrico Metric access method
Link de acesso:	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/
Resumo:	In the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets.

Improving the efficiency of k-medoids algorithms using metric access methods

Registros relacionados