Extração de relações do domínio de organizações para o português

Abreu, Sandra Collovini de

Extração de relações do domínio de organizações para o português

Detalhes bibliográficos
Ano de defesa:	2014
Autor(a) principal:	Abreu, Sandra Collovini de
Orientador(a):	Vieira, Renata
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Pontifícia Universidade Católica do Rio Grande do Sul
Programa de Pós-Graduação:	Programa de Pós-Graduação em Ciência da Computação
Departamento:	Faculdade de Informáca
País:	BR
Palavras-chave em Português:	INFORMÁTICA PROCESSAMENTO DA LINGUAGEM NATURAL RECUPERAÇÃO DA INFORMAÇÃO ONTOLOGIA
Área do conhecimento CNPq:	CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	http://tede2.pucrs.br/tede2/handle/tede/5248
Resumo:	The task of Relation Extraction from texts is one of the main challenges in the area of Information Extraction, considering the required linguistic knowledge and the sophistication of the language processing techniques employed. This task aims at identifying and classifying semantic relations that occur between entities recognized in a given text. For example, the sentence Next Saturday, Ronaldo Lemos, director of Creative Commons, will participate in a debate [...]" expresses a institutionalbond" relation that occurs between the named entities Ronaldo Lemos" and Creative Commons". This thesis proposes a process for extraction of relation descriptors, which describes the explicit relations between named entities in the Organization domain (Person, Organization and Location) by applying, to texts in Portuguese, Conditional Random Fields (CRF), a probabilistic model that has been used in various tasks e⇥ciently in processing sequential text, including the task of Relation Extraction. In order to implement the proposed process, a reference corpus for extracting relations, necessary for learning, was manually annotated based on a reference corpus for named entities (HAREM). Based on an extensive literature review on the automatic extraction of relations task, features of different types were defined. An experimental evaluation was performed to evaluate the learned model utilizing the defined features. Different input feature configurations for CRF were evaluated. Among them, the highlight was the inclusion of the semantic feature based on the named entity category, since this feature could express, in a better way, the kind of relationship between the pair of named entities we want to identify. Finally, the best results correspond to the extraction of relations between the named entities of Organization and Person categories, in which the F -measure rates were 57% and 63%, considering the correct and partially correct extractions, respectively.

Extração de relações do domínio de organizações para o português

Registros relacionados