Detalhes bibliográficos
Ano de defesa: |
2019 |
Autor(a) principal: |
Matos, Raimundo Tales Benigno Rocha |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Não Informado pela instituição
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
http://www.repositorio.ufc.br/handle/riufc/43348
|
Resumo: |
Feature selection methods provides us a way of reducing computation time, improving prediction performance, and a better understanding of the data in machine learning or pattern recognition applications. It has become the focus of much research in areas of application. In this work, we use feature selection to select the most relevant features in order to improve the binary classification of potential tax fraudsters. Classify possible fraudsters from taxpayer data, with binary features, presents several challenges: firstly, taxpayer data typically have features with low linear correlation between themselves. Also, tax frauds may originate from intricate illicit schemas, which in turn requires to uncover non-linear relationships between multiple fraud indicators (features). Finally, in the set of features existing in our experiments, only a small number of them show some correlation with the targeted class. Tax evasion represents one of the major obstacles faced by the economies of developing countries. Vast amounts of taxpayer information has been collected by fiscal agencies, thus opening up the possibility of devising novel techniques able to tackle fiscal evasion much more effectively than traditional approaches. In this work we propose ALICIA, a new feature selection method based on association rules and propositional logic with a carefully crafted graph centrality measure that attempts to tackle the above challenges while, at the same time, being agnostic to specific classification techniques. ALICIA wants to capture the intrinsic interrelation between the features in tax fraud detection. The proposed methodology is structured in three phases: firstly, ALICIA generates a set of relevant association rules from a set of fraud indicators (features). Subsequently ALICIA builds a graph, where each node represents a subset of features resulting in the association rules, while edges represent association relationships between subsets of features. Finally, ALICIA determines the most relevant features by applying a novel centrality measure, the Feature Topological Importance, on the vertices of the graph. We perform an extensive experimental evaluation to assess the validity of our proposal on four different real-world datasets, where we compare our solution with eight other feature selection methods. The results show that ALICIA achieves F-measure scores up to 76.88%, and consistently outperforms its competitors. |