Identificação de fraudes em licitações: uma abordagem utilizando agrupamento por interseção

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Galvão Júnior, David Pereira
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal da Paraíba
Brasil
Informática
Programa de Pós-Graduação em Informática
UFPB
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpb.br/jspui/handle/123456789/29850
Resumo: Bid rigging in public procurement auctions causes significant harm to the society, reducing the effectiveness of public services such as health and education. Despite being object of intense scrutiny by the authorities to mitigate this problem, its identification is not a trivial task, since fraudsters employ sophisticated tasks. Many of the previous works sought to identify a fraudulent bidding process through the analysis of factors, such as, for example, the financial values of the proposals submitted and the behavior of the participants in a bidding process. Recently, several works have proposed using this analysis as an additional input to a machine learning algorithm, with the purpose of automatic detection of fraudulent bidding. In this work, we propose to investigate the joint participation of companies in bidding processes. With this goal, we introduce a new clustering model, which seeks to maximize the use of common resources by cluster members. We have developed a set of tools to solve it: integer programming model and branch-and-bound algorithm. Furthermore, we demonstrate that the partitioning version of this model is a NP-Complete problem and we propose an adaptation of the silhouette function to measure the quality of the generated clusters. Additionally, we introduce a variation of this model for coverage clustering. To solve this version, we propose an enumerative algorithm and an integer programming model. In the experiments performed, the new clustering model manages to be superior in relation to literature models based on distance and edge editing. Specifically, in all cases tested, the new model obtained an equal or greater sum of intersections. From the obtained clusters we sought to measure how much of the joint participation of the members in bids occurred by chance or not. To do so, we proposed a set of metrics to describe the clusters. These metrics are used as an additional input to a machine learning model. In public tender data from different countries, the models that make use of the metrics proposed in this work manage to outperform the models that make use of the metrics in the literature. On average, the proposed models obtained a gain of approximately 8% in the validation correlation, in comparison with the literature metrics.