Detecção de fraudes em transações com cartão de crédito: uma comparação do desempenho de técnicas inteligentes com base na avaliação da função de custo

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Leonardis, Roger Willian Joel lattes
Orientador(a): Sassi, Renato José lattes
Banca de defesa: Sassi, Renato José lattes, Kitani, Edson Caoru lattes, Pereira, Fabio Henrique lattes
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Nove de Julho
Programa de Pós-Graduação: Programa de Pós-Graduação em Informática e Gestão do Conhecimento
Departamento: Informática
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://bibliotecatede.uninove.br/handle/tede/3242
Resumo: Credit card fraud detection faces an issue with the number of the fraud transactions being lower than non-fraud, making it difficult for machine learning models to effectively detect them. There are different types of solution to fix this imbalance, Oversampling and Undersampling can be used to deal with it. To evaluate and compare the performance of the machine learning models, metrics like the Confusion Matrix (CM), the Matthew Correlation Coefficient (MCC), the Area under the Curve (AUC) and the Cost Function (FC) can be applied. The FC result quantifies the financial impact caused by a real fraud misclassification and, because it is cost related, the lower its value the better its performance. The objective of this experiment was to compare the performance of machine learning models using the FC to detect fraud in credit card transactions in an unbalanced dataset. The dataset contains information about European credit cards transactions collected in 2013. The following models were applied: Logistic Regression (RL), Decision Trees (DT), Random Forest (RF), Support Vector Machine (SVM), Deep Learning (DL) and XGBoost (XG), over the unbalanced and balanced databases with Oversampling and Undersampling. To evaluate and compare the results, CM, MCC, AUC and FC were used. The best performance for AUC was RL with Oversampling, for MCC was for RF applied to the unbalanced base and for FC also RL with Oversampling presented the best performance. The reasons why RL with Oversampling outperformed the other models in two out of the three metrics may be connected to the common use of this model in fraud detection problems, therefore presented more adherence to the database used. As stated in the principle of Occam's Razor the recommendation for Machine Learning models use is to adopt the simplest one: RL with Oversampling. When considering the cost of an incorrect prediction, it is not enough to evaluate only the results obtained with the AUC and MCC metrics, one should also consider the results of the FC to support of a machine learning model definition.