Advancements in Microcluster and Outlier Detection: From Scalability Improvement by Capitalizing on Similarity Join Operations to a Comprehensive Evaluation of Clustering-Based Techniques

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Vinces, Braulio Valentin Sánchez
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-113850/
Resumo: This Ph.D. work addresses the critical challenge of outlier detection in large and complex data sets. We focus on developing efficient and scalable methods to accurately identify anomalies in various data types and scenarios. The first part of the dissertation explores the use of similarity join operations for distance-based outlier detection. We propose two novel methods: MCCATCH, which effectively identifies microclusters in dimensional and nondimensional data sets, and GOOST, which efficiently detects outliers in massive data streams. Both methods leverage similarity joins to achieve superior accuracy, efficiency, and scalability performance. The second part of the dissertation rigorously investigates the effectiveness of clustering-based outlier detection approaches. Through a meticulous and comprehensive comparative evaluation, we demonstrate that clustering-based methods can be competitive with state-of-the-art non-clustering-based algorithms, offering advantages in terms of robustness and scalability. Our research significantly contributes to the field of outlier detection by providing novel methodologies and insights into the effectiveness of different approaches. The methods we propose have profound practical implications for a wide range of applications, including fraud detection, network intrusion detection, and medical diagnosis, making our work highly relevant and applicable.