A resolução de anáforas pronominais da língua portuguesa com base no algoritmo de Mitkov

Detalhes bibliográficos
Ano de defesa: 2007
Autor(a) principal: Chaves, Amanda Rocha
Orientador(a): Rino, Lúcia Helena Machado lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: BR
Palavras-chave em Português:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/351
Resumo: One of the problems of natural language processing systems is to assure referential cohesion in a text. This property allows connecting the text constituents and making it readable. We address the anaphoric phenomenon as one of the main factors of referential cohesion. Anaphors depict a reference relationship between two or more text components, and the interpretation of the anaphor is dependent upon the interpretation of its antecedent. This work is limited to pronominal anaphors, thus, to automatic pronoun resolution. Several algorithms have been proposed to this end. They usually involve (1) identifying the anaphoric component; (2) determining the set of its possible antecedents; and (3) identifying and selecting the most likely antecedent of the anaphor. The lack of anaphora resolution in, e.g., information extraction and automatic translation or summarization may yield non-cohesive texts. Herein we present an adaptation of the Mitkov´s algorithm for pronoun resolution. 3rd person pronouns for Brazilian Portuguese are especially addressed, whose antecedents are noun phrases. This approach has been intrinsically evaluated on annotated corpora. It has also been compared with Lappin and Leass algorithm for pronoun resolution, adapted to Portuguese. Annotations embed morphological, syntactic and co-referential information. The evaluation measure adopted was the success rate. This is defined as the ratio between the number of anaphors correctly resolved by the system and the total number of anaphors in the text. The results of both evaluations are discussed here.