Mineração de Textos usando Word Embeddings com Contexto Geográfico

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Antônio Ronaldo da Silva
Orientador(a): Ricardo Marcondes Marcacini
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Fundação Universidade Federal de Mato Grosso do Sul
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufms.br/handle/123456789/5420
Resumo: Many essential phenomena are related to a geographic context, such as events extracted from textual bases in economics, public health, and urban violence. Manually analyzing events would be impractical, considering their significant volume and different data sources. Thus, there was a need for intelligent computational methods such as Text Mining that enable the exploration of textual content with geographic information and return patterns that traditional models would not find. The traditional model for analyzing the relationship between terms and regions is to calculate the probability of a term being used in texts associated with a region, in general, through the frequency of terms in regions. However, it is recognized that this approach fails for new terms presented to a model and texts with ambiguous terms. In this context, models based on Word Embeddings are recognized for improving the identification of the relationships between a word and the possible associated location. In this sense, this project investigates textual representations based on Word Embeddings from BERT models (Bidirectional Encoder Representations from Transformers) in a fine-tuning process, in which the georeferenced information of the texts is used as context. We named this proposal the GeoTransformers Language Model. One of the differentials of this proposal is to automatically identify macro-regions and micro-regions from the events and use them as a context for fine-tuning a language model. Compared to other models in the literature, the results generated by the GeoTransformers model obtained higher values for precision metrics, recall, F1-Score. Moreover, our model was the only one capable of dealing with regions with fewer events.