Aplicação de técnicas de aprendizagem de máquina com seleção de variáveis na previsão de receitas públicas de capitais brasileiras: estudo de caso das receitas transferidas de São Luís.
Ano de defesa: | 2023 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Estadual do Maranhão
Brasil Campus São Luis Centro de Ciências Tecnológicas – CCT PPG1 UEMA |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://repositorio.uema.br/jspui/handle/123456789/2471 |
Resumo: | Revenue forecasting is considered of great relevance for decision makers, as well as for planning. When it comes to the field of practical application, aimed at the Public Sector, regarding the municipal sphere, distortions are observed between the budgeted and predicted values, even applying the rules provided for in current legislation. The issue has been investigated by researchers with a history of advances in statistical regression methods and the application of machine learning techniques, but the problem of forecast divergences continues and legislation requires justification. In this context, it is necessary to investigate whether the price and quantity effects can be identified by machine learning techniques and whether revenue forecasting errors could be mitigated if the variables were used on an accrual basis for the inflow of resources. In this sense, this research aims to carry out a case study, with data from São Luís, to choose the variables that meet the legal prerogatives, adopting the CRISP- DM methodology, by comparing the list of importance of ensembles algorithms , Random Forests, Gradient Boosting and XGBoost, with a combined model of the wrapper filter approaches, submitting them to the same algorithms to choose the variables with the lowest evaluation metrics within a sequence of lowest errors of the transferred recipes. The document reports, as an execution comparison, the steps and tasks of the CRISP-DM in its first iteration, using data from the Transparency Portals, in the period from 2010 to 2021. In the results, two sets of data were compared, one with all transfers, including extraordinary amounts and another with only official quotas. The Combined Model obtained, in most of the results, the best metrics, especially in the extraordinary transfers, corroborating with the state of the art that already enshrines this approach, but the application of the Friedman test did not discard the null hypothesis, since the metrics of the two sets showed no significant differences. In the modeling, the RNN was complex and obtained the best metric, however, with the exception of the Mining resources, the difference in values was better in other algorithms and the Fridman test also did not have significant differences. As an answer to the research question, it was possible to clearly identify the quantity effect in both sets of data, but the price effect was not so evident in the results, appearing more when only the official quotas were tested. |