Advancements in Microcluster and Outlier Detection: From Scalability Improvement by Capitalizing on Similarity Join Operations to a Comprehensive Evaluation of Clustering-Based Techniques

Vinces, Braulio Valentin Sánchez

Advancements in Microcluster and Outlier Detection: From Scalability Improvement by Capitalizing on Similarity Join Operations to a Comprehensive Evaluation of Clustering-Based Techniques

Detalhes bibliográficos
Ano de defesa:	2024
Autor(a) principal:	Vinces, Braulio Valentin Sánchez
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Análise de fluxo de dados em tempo real Clustering-based outlier detection Detecção de microcluster em dados métricos Detecção de outlier baseada em a grupamento Detecção de outlier baseada em distância Distance-based outlier detection Escalabilidade Microcluster detection in metric data Real-time stream analysis Scalability
Link de acesso:	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-113850/
Resumo:	This Ph.D. work addresses the critical challenge of outlier detection in large and complex data sets. We focus on developing efficient and scalable methods to accurately identify anomalies in various data types and scenarios. The first part of the dissertation explores the use of similarity join operations for distance-based outlier detection. We propose two novel methods: MCCATCH, which effectively identifies microclusters in dimensional and nondimensional data sets, and GOOST, which efficiently detects outliers in massive data streams. Both methods leverage similarity joins to achieve superior accuracy, efficiency, and scalability performance. The second part of the dissertation rigorously investigates the effectiveness of clustering-based outlier detection approaches. Through a meticulous and comprehensive comparative evaluation, we demonstrate that clustering-based methods can be competitive with state-of-the-art non-clustering-based algorithms, offering advantages in terms of robustness and scalability. Our research significantly contributes to the field of outlier detection by providing novel methodologies and insights into the effectiveness of different approaches. The methods we propose have profound practical implications for a wide range of applications, including fraud detection, network intrusion detection, and medical diagnosis, making our work highly relevant and applicable.

Advancements in Microcluster and Outlier Detection: From Scalability Improvement by Capitalizing on Similarity Join Operations to a Comprehensive Evaluation of Clustering-Based Techniques

Registros relacionados