Detalhes bibliográficos
Ano de defesa: |
2017 |
Autor(a) principal: |
Amaral, Daniela Oliveira Ferreira do
 |
Orientador(a): |
Vieira, Renata
 |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Pontifícia Universidade Católica do Rio Grande do Sul
|
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Ciência da Computação
|
Departamento: |
Escola Politécnica
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://tede2.pucrs.br/tede2/handle/tede/8035
|
Resumo: |
The treatment of textual information has been increasingly relevant in many do- mains. One of the first tasks for extracting information from texts is the Named Entities Recognition (NER), which consists of identifying references to certain entities and finding out their classification. There are many NER domains, among them the most usual are medicine and biology. One of the challenging domains in the recognition of Named Entities (NE) is the Geology domain, which is an area lacking computational linguistic resources. This thesis proposes a method for the recognition of relevant NE in the field of Geology, specifically to the subarea of Brazilian Sedimentary Basin, in Portuguese texts. Generic and geological features were defined for the generation of a machine learning model. Among the automatic approaches to NE classification, the most prominent is the Conditional Ran- dom Fields (CRF) probabilistic model. CRF has been effectively used for word processing in natural language. To generate our model, we created GeoCorpus, a reference corpus for Geological NER, annotated by specialists. Experimental evaluations were performed to compare the proposed method with other classifiers. The best results were achieved by CRF, which shows 76,78% of Precision and 54,33% of F-Measure. |