Estratégias de seleção de atributos para detecção de anomalias em transações eletrônicas
Ano de defesa: | 2016 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Minas Gerais
UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/ESBF-ACXKTA |
Resumo: | Anomaly detection refers to the problem of finding patterns in data that deviates from the expected average behavior. One of the classic scenarios in this area is fraud detection, which consist in learn a fraudulent behavior from a set of observations. In electronic transactions, there is a large amount of information that could be used to detect fraud. Thus, filter this information and choose the most representative of it is a crucial task, known as Feature Selection. The best Feature Selection methods uses the class information to perform this task. However, an important characteristic in fraud detection problems is the high imbalance between the classes. This behavior generates a new challenge to Feature Selection techniques, which tend to select features in favor of the dominant class. Therefore, in this work we analyzed feature selection strategies to anomaly detection in electronic transactions. These strategies were divided in two distinct approaches. In the first approach we applied 7 resampling methods, including one created in this work, to reduce the imbalance between classes before feature selection step. In the second approach we evaluated 8 feature feature selection methods, considered insensitive to imbalance between the classes and we also create a method that uses the concept of Pareto Frontier to combine metrics. The validation of the effectiveness of the methods was performed building fraud detection models. This was performed applying 3 different classification techniques on the attributes selected by different approaches. To validate these models we performed case studies to fraud detection in 2 real dataset from electronic payment systems. We evaluate these models by 3 different metrics. Trough this experiments, we validate our research hypothesis, providing contributions to feature selection area in order to detect fraud. The best models achieved economic gains of up to 57% compared to the actual scenario of thecompany. |