Label operation for multi-label learning

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Silva, Adriano Rivolli da
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-18082020-161950/
Resumo: Classification tasks in which instances are associated with multiple concepts are known as multilabel classification. They have attracted growing attention in the machine-learning community, given the high number of applications and multi-labeled data available nowadays. Consequently, many strategies have been proposed exploring different particularities, such as label imbalance, dimensionality reduction and labels dependence. Despite that, some aspects that may affect strategies as a whole have been overlooked. For instance, some strategies transform the original multi-labeled data into single-labeled data upon which a base algorithm can be applied. However, the impact of choosing a specific base algorithm against another is unknown and usually ignored. Moreover, it was observed that many labels are never correctly predicted regardless of the strategies used. So far, very little attention has been paid to theses issues, which may produce misleading results. Therefore, this thesis aims to investigate the multi-label strategies covering these particularities. For such, an extensive comparative study is performed focusing on the influence of the base algorithms on the results. Moreover, label operation is proposed as an optimization procedure able to reduce the number of labels never predicted. Through an empirical methodology, label expansion and reduction enhanced different evaluation measures, mitigating the label prediction problem, although it was not completely removed. Additionally, metalearning is used to reduce the complexity of the operations and to provide some understanding concerning the studied issue. Considering this, characterization measures for meta-learning were systematically investigated, which resulted in a new taxonomy to organize them. In summary, the findings and contributions presented here are relevant to the multi-label and meta-learning research fields. They potentially have an impact on the methodology, and raise open new questions concerning unnoticed aspects of these areas.