Um sistema de apoio à detecção de anomalias em dados governamentais usando múltiplos classificadores

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Souza, Rafael Alexandrino Spíndola de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal da Paraíba
Brasil
Informática
Programa de Pós-Graduação em Informática
UFPB
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpb.br/jspui/handle/123456789/21205
Resumo: With increasing amounts of data to be analyzed and correctly interpreted, Anomaly Detection (or Outliers) appears as one of the areas of significant impact in the context of Data Mining (DM). Its applications extend to the most diverse human activity fields, such as medicine, administration, process management, information science, physics, economics, and many other activities. In this work, we propose a non-parametric system to support the detection of aberrant events in stationary databases. The database comes from the Public Administration and related to the Federal Government’s Disbursement and Bidding Data between 2014 and 2019, to the Fund’s Budget Data Municipal Health of João Pessoa - PB, between 2016 and 2020, and Data on the Fleet Management of the State of Paraíba between 2017 and 2019. The proposed solution combines some supervised and unsupervised detection algorithms (OCSVM, LOF, CBLOF, HBOS, KNN, Isolation Forest, and Robust Covariance) to classify events as anomalies. The results showed that the solution identifies an average of 90.07% correctly events as outliers. Therefore, there are indications that the proposed solution can contribute to government audit support activities and management and decision-making processes, these arising from the interpretation of the phenomena present in the data.