Seleção de características genética com mutação individual por bit baseada em Pearson e clusterização de variáveis utilizando medidas de dissimilaridade

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Araujo, Adriano Gomes Sabino de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal do Rio de Janeiro
Brasil
Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia
Programa de Pós-Graduação em Engenharia de Sistemas e Computação
UFRJ
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/11422/14064
Resumo: Reducing the number of dimensions of a problem allows not only to reduce the processing time of the used learning technique but also to improve its performance. Feature Selection and Feature Clustering are two important ways to accomplish such a reduction. The first one is the search for the ideal feature set to solve a problem, that is, the one that makes it possible to reach the best result when using a predictor. The second one is intended to group dimensions in order to use the clusters to generate the new problem input set. This work introduces a genetic algorithm for feature selection and differs from others in the following aspects: (1) individual mutation rate per bit and proportional to the Pearson correlation coefficient and (2) initial population generation based on the same coefficient. In addition, it presents a feature clustering algorithm that, unlike other works in the literature, merge more dissimilar dimensions. Experiments were performed with both algorithms and the results obtained were promising. Individually performed well and, when performed one after another, resulted in better performances. The experiments were carried out on different databases, highlighting as main the text classification database Reuters 21,578. The best result was with Precision (P) of 0.9890, Recall (R) of 0.9815 and F1 of 0.9852. On Reuters, the result was compared with three other papers and was superior to the best of them ([UĞUZ, 2011]).