Automated Multi-Label Classification: Methods, Issues and Prospects

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Alex Guimarães Cardoso de Sá
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Minas Gerais
Brasil
ICEX - INSTITUTO DE CIÊNCIAS EXATAS
ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
Programa de Pós-Graduação em Ciência da Computação
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/58601
https://orcid.org/0000-0002-7276-7839
Resumo: Automated Machine Learning (AutoML) has emerged to deal with the task of automatically selecting learning algorithms and their hyper-parameters to successfully solve a given ML problem. This is mainly done to avoid ad hoc approaches to perform this task. With the outgrowing popularity of Machine Learning (ML) algorithms and their indiscriminate use by practitioners, who do not necessarily know the peculiarities of these algorithms, the field of AutoML has become more relevant than ever. This thesis, in particular, is centered on AutoML for Multi-Label Classification (MLC) problems. In MLC, each example in the dataset can be simultaneously associated with several class labels, making it a generalization of its canonical single-label version (i.e., with a single class label per example). Essentially, MLC is concerned with learning a model that separates each class label into relevant and irrelevant for each example in the dataset. Although we have experienced the progression of the field of AutoML, which introduced effective methods for Single-Label Classification (SLC) and regression problems, there are still several issues in AutoML research that remain open. This thesis focuses on three of them. First, we investigate if our four proposed AutoML methods can work for MLC problems as well as they work for SLC and regression problems. Apart from the inherent challenges in MLC (e.g., the hardness of learning from this type of data, the strain to evaluate its models, and the computational cost involved), our results showed that it is possible to develop AutoML methods for MLC problems that perform as good as or better than well-known global and local search methods. Second, we present an analysis relating to the size of three designed search spaces and the performance of the AutoML methods in recommending configured learning algorithms. By increasing and decreasing the search space size, we show that the proposed AutoML methods do not satisfactorily trade-off between exploration (novelty) and exploitation (locality) besides their results. Our convergence analysis also indicated that we must still improve the proposed AutoML methods (i.e., their internal mechanisms) to ensure this trade-off. Finally, we investigate how distinct time budgets (constraining the whole AutoML process) can influence and constrain the behavior of the AutoML search methods and their overall predictive performance.