A thesis submitted to the Faculty of Graduate and Postdoctoral Studies of the University of Ottawa in partial fulfillment of the requirements for the degree of Doctor of Philosophy

Detalhes bibliográficos
Ano de defesa: 2004
Autor(a) principal: Souza, Jerffeson Teixeira de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: University of Ottawa
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://siduece.uece.br/siduece/trabalhoAcademicoPublico.jsf?id=83883
Resumo: <div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">Abstract</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">The Feature Selection problem involves discovering a subset of features, such that a classifier</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">built only with this subset would have better predictive accuracy than a classifier built</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">from the entire set of features. A large number of algorithms have already been proposed</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">for the feature selection problem. Although significantly different with regards to 1) the</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">search strategy they use to determine the right subset of features and 2) how each subset</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">is evaluated, feature selection algorithms are usually classified in three general groups:</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">Filters, Wrappers and Hybrid solutions.</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">In this thesis, we propose a new hybrid system for the problem of feature selection</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">in machine learning. The idea behind this new algorithm, FortalFS, is to extract and</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">combine the best characteristics of filters and wrappers in one algorithm. FortalFS uses</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">results from another feature selection system as a starting point in the search through</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">subsets of features that are evaluated by a machine learning algorithm. With an efficient</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">search heuristic, we can decrease the number of subsets of features to be evaluated by</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">the learning algorithm, consequently decreasing computational effort and still be able to</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">select an accurate subset. We have also designed a variant of the original algorithm in the</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">attempt to work with feature weighting algorithm.</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">In order to evaluate this new algorithm, a number of experiments were run and the</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">results compared to well-known feature selection filter and wrapper algorithms, such as</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">Focus, Relief, LVF, and others. Such experiments were run over a number of datasets from</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">the UCI Repository. Results showed that FortalFS outperforms most of the algorithms</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">significantly. However, it presents time-consuming performance similar to that of wrappers.</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">Additional experiments using specially designed artificial datasets demonstrated</span></font></div><div style=""><font face="Arial, Verdana"><span style="font-size: 13.3333px;">that FortalFS is able to identify and remove both irrelevant, redundant and randomly</span></font></div>