Resolução de topônimos em textos não estruturados baseada em heurísticas

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Sá, Breno Alef Dourado
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.repositorio.ufc.br/handle/riufc/69874
Resumo: Everyday, people use place names and spatial relationships to give directions and inform the location of events. Mentions of places, also called toponyms, are present in the most varied types of documents with geographic content, such as articles, blogs, reports and criminal reports. The geographic information extracted from these documents can be used in emergency response applications, epidemic monitoring, news gathering, tourism planning, among others. However, due to the absence of metadata, extracting this information from unstructured texts is not a trivial task. One of the challenges in this process is the mapping of toponyms to geographic coordinates due to the ambiguity of the names of the places, which commonly have homonyms. The process of solving toponyms to their coordinates, obtaining candidates and disambiguating them, is called geocoding. The present work proposes and evaluates two heuristics for geocoding: normalization of adjectival toponyms and geometric optimization by toponym type. Initially, the baseline is defined through experiments with heuristics. Then, two geocoders are created by modifying the baseline to use each of the heuristics proposed in this work. Finally, a third geocoder is similarly created to use the combination of the two heuristics. The results indicate an improvement in the performance of geocoding using these heuristics compared to the baseline, even surpassing state-of-the-art geocoders in the databases evaluated.