Classificação automática de documentos: seleção customizada do classificador

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Silva, Paulo Henrique da lattes
Orientador(a): Martins, Wellington Santos lattes
Banca de defesa: Martins, Wellington Santos, Rosa, Thierson Couto, Sousa, Daniel Xavier de
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Goiás
Programa de Pós-Graduação: Programa de Pós-graduação em Ciência da Computação (INF)
Departamento: Instituto de Informática - INF (RG)
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://repositorio.bc.ufg.br/tede/handle/tede/11175
Resumo: The recent increase in digitally stored data has spurred the development of methods to organize and extract relevant knowledge from this large volume of data. Automatic document classification (ADC) is one such method. Considered one of the most relevant and challenging tasks in the context of information retrieval, due to the high dimensionality and sparse data, it uses machine learning techniques to group similar documents into classes. Recent works advocate the use of multiple classifier systems (MCS) to improve the accuracy of ADC, through the combination of a set of classifiers to obtain better results in relation to a single classifier. One of the most promising approaches to MCS is dynamic selection (DS), where the base classifiers are selected in real time, according to each new consultation document (test) to be classified. This work proposes the customized selection of the classification method performed in consultation time (test). Only the most competent classifier, or the most competent set of classifiers, is selected to predict the label of each consultation document. In addition, the paper presents the exploration of parallelism to speed up the ADC task. Experimental results, using standardized databases, show competitive and promising results in relation to the baselines used. New opportunities for exploring parallelism are also presented as future work.