Novos critérios para seleção de modelos neurais em problemas de classificação com dados desbalanceados

Detalhes bibliográficos
Ano de defesa: 2011
Autor(a) principal: Cristiano Leite de Castro
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/BUOS-8WHGE7
Resumo: Artificial Neural Network learners induced from complex and highlyimbalanced data sets tend to yield classification models that are biasedtowards the overrepresented (majority) class. Although someapproaches in the literature address this issue, they are limited in theformalization and theoretical characterization of the problem. Here, aformal analysis of the nature of class imbalance problem is describedbased on Bayesian Decision and Statistical Learning theories. Asshown the problem arises as a direct consequence of the minimizationof a (general) criteria based on the overall error rate and the level ofdistribution overlapping (noise). Furthermore, two new learning algorithms for MultiLayer Perceptron topology are designed: WEMLPand AUCMLP. Both are formulated from specific criteria for modelselection, which are different from the overall error. The cost functionfor WEMLP algorithm uses a parameter to assign unequal losses(costs) to the errors of each class. The AUCMLP algorithm optimizesa differentiable approximation of the Wilcoxon-Mann-Whitneystatistic, analogous metric to the AUC (Area Under the ROC Curve).In order to incorporate an effective strategy of controlling complexity(flexibility) of models, multiobjective (MOBJ) extensions forWEMLPand AUCMLP formulations are provided. Based on statistical analysisof significance of results on real data our approach shows a significantimprovement in the classification ranking quality, and achieveshigh and balanced accuracy rates for both classes.