Detecção de fraudes em transações com cartão de crédito: uma comparação do desempenho de técnicas inteligentes com base na avaliação da função de custo

Leonardis, Roger Willian Joel

Detecção de fraudes em transações com cartão de crédito: uma comparação do desempenho de técnicas inteligentes com base na avaliação da função de custo

Detalhes bibliográficos
Ano de defesa:	2023
Autor(a) principal:	Leonardis, Roger Willian Joel
Orientador(a):	Sassi, Renato José
Banca de defesa:	Sassi, Renato José , Kitani, Edson Caoru , Pereira, Fabio Henrique
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Nove de Julho
Programa de Pós-Graduação:	Programa de Pós-Graduação em Informática e Gestão do Conhecimento
Departamento:	Informática
País:	Brasil
Palavras-chave em Português:	fraudes em cartão de crédito função de custo custo de classificação inteligência artificial base de dados desbalanceada
Palavras-chave em Inglês:	credit card fraud cost function classification cost artificial intelligence imbalanced dataset
Área do conhecimento CNPq:	CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
Link de acesso:	http://bibliotecatede.uninove.br/handle/tede/3242
Resumo:	Credit card fraud detection faces an issue with the number of the fraud transactions being lower than non-fraud, making it difficult for machine learning models to effectively detect them. There are different types of solution to fix this imbalance, Oversampling and Undersampling can be used to deal with it. To evaluate and compare the performance of the machine learning models, metrics like the Confusion Matrix (CM), the Matthew Correlation Coefficient (MCC), the Area under the Curve (AUC) and the Cost Function (FC) can be applied. The FC result quantifies the financial impact caused by a real fraud misclassification and, because it is cost related, the lower its value the better its performance. The objective of this experiment was to compare the performance of machine learning models using the FC to detect fraud in credit card transactions in an unbalanced dataset. The dataset contains information about European credit cards transactions collected in 2013. The following models were applied: Logistic Regression (RL), Decision Trees (DT), Random Forest (RF), Support Vector Machine (SVM), Deep Learning (DL) and XGBoost (XG), over the unbalanced and balanced databases with Oversampling and Undersampling. To evaluate and compare the results, CM, MCC, AUC and FC were used. The best performance for AUC was RL with Oversampling, for MCC was for RF applied to the unbalanced base and for FC also RL with Oversampling presented the best performance. The reasons why RL with Oversampling outperformed the other models in two out of the three metrics may be connected to the common use of this model in fraud detection problems, therefore presented more adherence to the database used. As stated in the principle of Occam's Razor the recommendation for Machine Learning models use is to adopt the simplest one: RL with Oversampling. When considering the cost of an incorrect prediction, it is not enough to evaluate only the results obtained with the AUC and MCC metrics, one should also consider the results of the FC to support of a machine learning model definition.

Detecção de fraudes em transações com cartão de crédito: uma comparação do desempenho de técnicas inteligentes com base na avaliação da função de custo

Registros relacionados