Efficient online tree, rule-based and distance-based algorithms

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Mastelini, Saulo Martiello
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-30082023-135843/
Resumo: The fast development of digital technologies has given rise to the constant production of data in different forms and from different sources. While at the beginning of machine learning (ML) studies, data scarcity was a relevant problem for many application domains, nowadays, we may have too much information to handle with traditional ML algorithms. Besides, changes in the underlying data distributions that govern the data generation might render traditional ML solutions useless in real-world applications. Online ML (OML) aims to create solutions able to process data incrementally, with limited computation resource usage, and to deal with time-changing data distributions. Despite successfully creating efficient solutions applied in diverse domains, we have seen a recent growing trend in creating OML algorithms that only focus on predictive performance and overlook computational costs. This observation is even more prevalent when considering regression tasks, using decision trees, decision rules, and ensembles thereof, which are among the most popular OML solutions. Decreasing the computational costs of OML solutions could be more relevant than a slight increase in predictive performance from a real-world application standpoint. Hence, in this thesis, we focus on creating improved and efficient OML algorithms whose primary focus is decreasing the time and memory costs of tree and decision rule-based regressors and ensemble-based regressors. The desired bi-product is improving or, at least, leaving the predictive performance unchanged. We also explore an efficient algorithm to perform incremental nearest-neighbor searches. This thesis is organized as an article collection, comprehending our most relevant publications focused on the presented theme. We tackle strategies to create low-error ensemble-based regressors, efficient strategies to build incremental decision tree regressors, propose a fast and accurate decision tree-based ensemble regressor, and explore an efficient and versatile algorithm to perform nearest neighbor search in sliding windows.