Aplicação de técnicas de aprendizagem de máquina com seleção de variáveis na previsão de receitas públicas de capitais brasileiras: estudo de caso das receitas transferidas de São Luís.

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Pimentel, Cláudia Patrícia Silva
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Estadual do Maranhão
Brasil
Campus São Luis Centro de Ciências Tecnológicas – CCT
PPG1
UEMA
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.uema.br/jspui/handle/123456789/2471
Resumo: Revenue forecasting is considered of great relevance for decision makers, as well as for planning. When it comes to the field of practical application, aimed at the Public Sector, regarding the municipal sphere, distortions are observed between the budgeted and predicted values, even applying the rules provided for in current legislation. The issue has been investigated by researchers with a history of advances in statistical regression methods and the application of machine learning techniques, but the problem of forecast divergences continues and legislation requires justification. In this context, it is necessary to investigate whether the price and quantity effects can be identified by machine learning techniques and whether revenue forecasting errors could be mitigated if the variables were used on an accrual basis for the inflow of resources. In this sense, this research aims to carry out a case study, with data from São Luís, to choose the variables that meet the legal prerogatives, adopting the CRISP- DM methodology, by comparing the list of importance of ensembles algorithms , Random Forests, Gradient Boosting and XGBoost, with a combined model of the wrapper filter approaches, submitting them to the same algorithms to choose the variables with the lowest evaluation metrics within a sequence of lowest errors of the transferred recipes. The document reports, as an execution comparison, the steps and tasks of the CRISP-DM in its first iteration, using data from the Transparency Portals, in the period from 2010 to 2021. In the results, two sets of data were compared, one with all transfers, including extraordinary amounts and another with only official quotas. The Combined Model obtained, in most of the results, the best metrics, especially in the extraordinary transfers, corroborating with the state of the art that already enshrines this approach, but the application of the Friedman test did not discard the null hypothesis, since the metrics of the two sets showed no significant differences. In the modeling, the RNN was complex and obtained the best metric, however, with the exception of the Mining resources, the difference in values was better in other algorithms and the Fridman test also did not have significant differences. As an answer to the research question, it was possible to clearly identify the quantity effect in both sets of data, but the price effect was not so evident in the results, appearing more when only the official quotas were tested.