Estudo comparativo de diferentes classificadores baseados em aprendizagem de máquina para o processo de Reconhecimento de Entidades Nomeadas

Santos, Jadson da Silva

Estudo comparativo de diferentes classificadores baseados em aprendizagem de máquina para o processo de Reconhecimento de Entidades Nomeadas

Detalhes bibliográficos
Ano de defesa:	2016
Autor(a) principal:	Santos, Jadson da Silva
Orientador(a):	Rocha Júnior, João Batista da
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Estadual de Feira de Santana
Programa de Pós-Graduação:	Mestrado em Computação Aplicada
Departamento:	DEPARTAMENTO DE TECNOLOGIA
País:	Brasil
Palavras-chave em Português:	Reconhecimento de Entidades Nomeadas Aprendizagem de Máquina Extração de Informação Processamento de Linguagem Natural
Palavras-chave em Inglês:	Named Entity Recognition Machine Learning Information Extraction Natural Linguage Process
Área do conhecimento CNPq:	METODOLOGIA E TECNICAS DA COMPUTACAO::SISTEMAS DE INFORMACAO
Link de acesso:	http://localhost:8080/tede/handle/tede/554
Resumo:	The Named Entity Recognition (NER) process is the task of identifying relevant termsintextsandassigningthemalabel.Suchwordscanreferencenamesofpeople, organizations, and places. The variety of techniques that can be used in the named entityrecognitionprocessislarge.Thetechniquescanbeclassifiedintothreedistinct approaches: rule-based, machine learning and hybrid. Concerning to the machine learningapproaches,severalfactorsmayinfluenceitsaccuracy,includingtheselected classifier, the set of features extracted from the terms, the characteristics of the textual bases, and the number of entity labels. In this work, we compared classifiers that use machine learning applied to the NER task. The comparative study includes classifiers based on CRF (Conditional Random Fields), MEMM (MaximumEntropy Markov Model) and HMM (Hidden Markov Model), which are compared in two corpora in Portuguese derived from WikiNer, and HAREM, and two corporas in English derived from CoNLL-03 and WikiNer. The comparison of the classifiers shows that the CRF is superior to the other classifiers, both with Portuguese and English texts. This study also includes the comparison of the individual and joint contribution of features, including contextual features, besides the comparison ofthe NER per named entity labels, between classifiers andcorpora.

Estudo comparativo de diferentes classificadores baseados em aprendizagem de máquina para o processo de Reconhecimento de Entidades Nomeadas

Registros relacionados