Algoritmos genético para imputação múltipla de dados na classificação multirrótulo

JACOB JUNIOR, Antonio Fernando Lavareda

Algoritmos genético para imputação múltipla de dados na classificação multirrótulo

Detalhes bibliográficos
Ano de defesa:	2024
Autor(a) principal:	JACOB JUNIOR, Antonio Fernando Lavareda
Orientador(a):	SANTANA, Ewaldo Eder Carvalho
Banca de defesa:	SANTANA, Ewaldo Eder Carvalho , LOBATO, Fábio Manoel França , BARROS FILHO, Allan Kardec Duailibe , SILVA, Francisco Jose Da Silva e , CORTES, Omar Andres Carmona
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal do Maranhão
Programa de Pós-Graduação:	PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA DE ELETRICIDADE/CCET
Departamento:	DEPARTAMENTO DE ENGENHARIA DA ELETRICIDADE/CCET
País:	Brasil
Palavras-chave em Português:	valores ausentes; classificação multirrótulo; algoritmos genéticos.
Palavras-chave em Inglês:	missing values; multi-label classification genetic algorithms.
Área do conhecimento CNPq:	Ciências Exatas e da Terra
Link de acesso:	https://tedebc.ufma.br/jspui/handle/tede/5255
Resumo:	Missing data is a prevalent problem that requires attention, as most data analysis techniques are unable to handle it. This is particularly critical in Multi-Label Classification (MLC), where only a few studies have investigated missing data in this application domain. MLC differs from Single-Label Classification (SLC) by allowing an instance to be associated with multiple classes. Movie classification is a didactic example since it can be “drama” and “bibliography” simultaneously. One of the most usual missing data treatment methods is data imputation, which seeks plausible values to fill in the missing ones. In this scenario, we propose a novel imputation method based on a multi-objective genetic algorithm for optimizing multiple data imputations called Multiple Imputation of Multi- label Classification data with a genetic algorithm, or simply EvoImp. We applied the proposed method in multi-label learning and evaluated its performance using six synthetic databases, considering various missing values distribution scenarios. The method was compared with other state-of-the-art imputation strategies, such as K-Means Imputation (KMI) and weighted K-Nearest Neighbors Imputation (WKNNI). The results proved that the proposed method outperformed the baseline in all the scenarios by achieving the best evaluation measures considering the Exact Match, Accuracy, and Hamming Loss. The superior results were constant in different dataset domains and sizes, demonstrating the EvoImp robustness. Thus, EvoImp represents a feasible solution to missing data treatment for multi-label learning.

Algoritmos genético para imputação múltipla de dados na classificação multirrótulo

Registros relacionados