Estudo comparativo de diferentes classificadores baseados em aprendizagem de máquina para o processo de Reconhecimento de Entidades Nomeadas

Detalhes bibliográficos
Ano de defesa: 2016
Autor(a) principal: Santos, Jadson da Silva lattes
Orientador(a): Rocha Júnior, João Batista da lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Estadual de Feira de Santana
Programa de Pós-Graduação: Mestrado em Computação Aplicada
Departamento: DEPARTAMENTO DE TECNOLOGIA
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://localhost:8080/tede/handle/tede/554
Resumo: The Named Entity Recognition (NER) process is the task of identifying relevant termsintextsandassigningthemalabel.Suchwordscanreferencenamesofpeople, organizations, and places. The variety of techniques that can be used in the named entityrecognitionprocessislarge.Thetechniquescanbeclassifiedintothreedistinct approaches: rule-based, machine learning and hybrid. Concerning to the machine learningapproaches,severalfactorsmayinfluenceitsaccuracy,includingtheselected classifier, the set of features extracted from the terms, the characteristics of the textual bases, and the number of entity labels. In this work, we compared classifiers that use machine learning applied to the NER task. The comparative study includes classifiers based on CRF (Conditional Random Fields), MEMM (MaximumEntropy Markov Model) and HMM (Hidden Markov Model), which are compared in two corpora in Portuguese derived from WikiNer, and HAREM, and two corporas in English derived from CoNLL-03 and WikiNer. The comparison of the classifiers shows that the CRF is superior to the other classifiers, both with Portuguese and English texts. This study also includes the comparison of the individual and joint contribution of features, including contextual features, besides the comparison ofthe NER per named entity labels, between classifiers andcorpora.