Resolução de correferências em língua portuguesa : pessoa, local e organização

Detalhes bibliográficos
Ano de defesa: 2014
Autor(a) principal: Fonseca, Evandro Brasil lattes
Orientador(a): Vieira, Renata lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Pontifícia Universidade Católica do Rio Grande do Sul
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação
Departamento: Faculdade de Informáca
País: BR
Palavras-chave em Português:
Área do conhecimento CNPq:
Link de acesso: http://tede2.pucrs.br/tede2/handle/tede/5257
Resumo: Coreference resolution is a process that consists in identifying the several forms that a specific named entity may assume on certain text. In other words, this process consists in identifying certain terms and expressions that refer certain named entity. The automatic textual coreference resolution is in a very important context in the Natural Language Processing (NLP) area, because several systems need itstasks, such as the relation extraction between named entities. The linguistic processing level depends on the knowledgeabout the world, and this is a challenge for thisarea, mainly for the Portuguese language. The growing necessity of NLP tools and the lack of open source resources for Portuguese have inspired the research on this language, and they became the focus of this dissertation. The present work aims at building an open source tool for the Coreference resolution in Portuguese, focusing on the Person, Location and Organization domains. These three categories were chosen given their relevance for most NLP tasks, because they represent more specifically entities of common interest.Furthermore, they are the most explored categories in the related works. The choice for working only with open source resourcesis because most of related works forPortuguese usesprivate software, which limits his availability and his usability.The methodology is based on supervised machine learning. For this task, the use of features that help on the correct classification of noun phrase pairs as coreferent or non-coreferent are essential for grouping them later, thus building coreference chains.Although there are still many challenges to be overcome, the results of the system described in this dissertationare encouraging when compared indirectly, by using the same metric,to the current state of the art.