Detalhes bibliográficos
Ano de defesa: |
2024 |
Autor(a) principal: |
Teixeira, Larissa Roberta |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-27082024-144742/
|
Resumo: |
In the dawn of computing, data processing techniques and tools were developed to deal with scalar data types. However, with technological advancements, there has been a significant growth in the amount and complexity of data. This surge has necessitated the development of techniques that can efficiently handle complex data types. Here we call as complex the data types that do not have a predefined way on how they must be compared, as is the case of comparisons involving similarity. Among the existing strategies in the literature, clustering techniques stand out as a promising approach for identifying patterns in data by forming groups. Within the realm of clustering algorithms, k-medoids-based methods have emerged as one of the most widely used approaches. However, these methods exhibit high computational costs when applied to large datasets. Despite numerous efforts in the literature to optimize k-medoids algorithms, they still face limitations when dealing with large datasets, in particular when these data are complex. This is primarily because they need to compute and store a distance matrix in memory, rendering them impractical for handling voluminous datasets. In this masters research, the KluSIM algorithm is proposed, a novel approach to enhance the computational efficiency of the swap step in k-medoids algorithms. KluSIM employs Access Methods to prune the search space, significantly accelerating the swap step. Additionally, KluSIM eliminates the need to maintain a distance matrix in main memory, effectively overcoming the memory limitations of existing methodologies. Overall, the experiments conducted demonstrate that KluSIM effectively optimizes the swap step k-medoids algorithms, substantially reducing the number of distance calculations required during the clustering process. Furthermore, KluSIM can be applied to big data tasks as it proves to be scalable and effective for clustering in the tested datasets. |