ML-MDLText: um método de classificação de textos multirrótulo de aprendizado incremental

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Bittencourt, Marciele de Menezes
Orientador(a): Almeida, Tiago Agostinho de lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus Sorocaba
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC-So
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/12436
Resumo: Single-label text classification has been extensively studied in the last decades and usually more attention has been given to offline learning scenarios, where all of the training data is available in advance. However, real-world text classification problems often involve multilabel instances and have dynamic textual patterns that can change frequently. In this context, ideally, the methods should be able to predict a subset of target labels rather than a single one, and to update their model incrementally to be scalable and adaptable to changes in data patterns using limited time and memory. Therefore, online and multilabel learning have attracted great research interest, since there are few methods capable of addressing both problems simultaneously. In this study, we present a text classification method based on the minimum description length principle. It can be applied to multilabel classification without requiring the transformation of the classification problem. It also takes advantage of dependency information among labels and naturally supports online learning. We evaluated its performance using fifteen datasets from different application domains and compared it with traditional benchmarks classifiers, considering offline and online learning scenarios. The results obtained by the proposed method were very competitive with the ones of existing state-of-the-art methods.