Learning beyond the spatial autocorrelation structure: A machine learning- based approach to discovering new patterns and relationships in the context of spatially contextualized modeling of voting behavior

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Silva, Tiago Pinho da
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-15012024-174102/
Resumo: Elections are a cornerstone of democratic societies, providing citizens with the means to elect their representatives and shape the direction of their government. However, we have seen in recent years an increase in concern about the integrity of electoral processes worldwide, with allegations of fraud and rising polarization. To better comprehend the electorate and the factors influencing its choices, an increase number of researchers have turned to electoral behavior modeling, which sheds light on political phenomena such as polarization and the demographic and socioeconomic contexts shaping the nature of the electorate. The literature on electoral behavior modeling can be broadly divided into two main areas: political science, which argues that only individual factors explain electoral behavior using primarily survey data; and electoral geography, which asserts that contextual factors, such as location, play a crucial role in determining electoral behavior using datasets with spatially aggregated information such as census data. Political science has become the dominant approach due to the increased quality of data collected from surveys, but the public availability of such data is limited and costly. In contrast, census data, which provides detailed information about a populations socioeconomic and demographic characteristics, is made publicly available by government agencies. However, despite its potential for providing comprehensive and insightful information on the electorate, large census datasets are underutilized in modeling electoral behavior, mainly due to the limitations of regression analysis in handling high-dimensional data and identifying non-linear relationships. To address these limitations, there has been a growing trend towards using machine learning methods that can better handle high-dimensionality and model non-linear relationships. However, most of these works neglect the spatial characteristics of the data. This thesis argues for the importance of incorporating spatial dependence information in the machine learning pipeline for the task of electoral behavior modeling using census data. The traditional machine learning pipeline can exhibit bias towards models that learn the spatial autocorrelation structure, hindering the discovery of novel patterns and relationships beyond this structure, which contradicts the main objective of identifying new patterns and relationships. In this thesis, the impact of spatial dependence on the task of electoral behavior modeling is studied, and adaptations to the traditional machine learning pipeline are proposed, developed, and evaluated for the considering task. In this regard, we propose two Spatial Cross-Validation techniques that take into account the spatial aspects of the data and provide scenarios for the evaluation of machine learning models without the influence of spatial dependence. Moreover, we propose a stacking-based machine learning approach to model the data based on geographical contexts and identify local and global relationships to understand the election results. The results in this thesis indicate that the proposed approaches are well-suited to the task of spatially contextualized modeling of electoral behavior. The validation techniques were able to provide more realistic and less biased scenarios when compared to existing approaches in the literature, and the machine learning approach outperformed the state-of-the-art in the literature and provided interpretable results. Overall, this research advances the state-of-the-art in electoral behavior modeling and provides a novel methodology in the electoral behavior area, paving the way for new machine learning approaches to help understand election results.