Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
Leonardis, Roger Willian Joel
![lattes](/bdtd/themes/bdtd/images/lattes.gif?_=1676566308) |
Orientador(a): |
Sassi, Renato José
![lattes](/bdtd/themes/bdtd/images/lattes.gif?_=1676566308) |
Banca de defesa: |
Sassi, Renato José
,
Kitani, Edson Caoru
,
Pereira, Fabio Henrique
![lattes](/bdtd/themes/bdtd/images/lattes.gif?_=1676566308) |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Universidade Nove de Julho
|
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Informática e Gestão do Conhecimento
|
Departamento: |
Informática
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://bibliotecatede.uninove.br/handle/tede/3242
|
Resumo: |
Credit card fraud detection faces an issue with the number of the fraud transactions being lower than non-fraud, making it difficult for machine learning models to effectively detect them. There are different types of solution to fix this imbalance, Oversampling and Undersampling can be used to deal with it. To evaluate and compare the performance of the machine learning models, metrics like the Confusion Matrix (CM), the Matthew Correlation Coefficient (MCC), the Area under the Curve (AUC) and the Cost Function (FC) can be applied. The FC result quantifies the financial impact caused by a real fraud misclassification and, because it is cost related, the lower its value the better its performance. The objective of this experiment was to compare the performance of machine learning models using the FC to detect fraud in credit card transactions in an unbalanced dataset. The dataset contains information about European credit cards transactions collected in 2013. The following models were applied: Logistic Regression (RL), Decision Trees (DT), Random Forest (RF), Support Vector Machine (SVM), Deep Learning (DL) and XGBoost (XG), over the unbalanced and balanced databases with Oversampling and Undersampling. To evaluate and compare the results, CM, MCC, AUC and FC were used. The best performance for AUC was RL with Oversampling, for MCC was for RF applied to the unbalanced base and for FC also RL with Oversampling presented the best performance. The reasons why RL with Oversampling outperformed the other models in two out of the three metrics may be connected to the common use of this model in fraud detection problems, therefore presented more adherence to the database used. As stated in the principle of Occam's Razor the recommendation for Machine Learning models use is to adopt the simplest one: RL with Oversampling. When considering the cost of an incorrect prediction, it is not enough to evaluate only the results obtained with the AUC and MCC metrics, one should also consider the results of the FC to support of a machine learning model definition. |