Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico

Castro, Pedro Vitor Quinta de

Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico

Detalhes bibliográficos
Ano de defesa:	2019
Autor(a) principal:	Castro, Pedro Vitor Quinta de
Orientador(a):	Silva, Nádia Félix Felipe da
Banca de defesa:	Silva, Nadia Felix Felipe da, Rosa, Thierson Couto, Soares, Anderson da Silva, Caseli, Helena de Medeiros
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Goiás
Programa de Pós-Graduação:	Programa de Pós-graduação em Ciência da Computação (INF)
Departamento:	Instituto de Informática - INF (RG)
País:	Brasil
Palavras-chave em Português:	Reconhecimento de entidades nomeadas Processamento de linguagem natural Deep learning Redes neurais Língua portuguesa Direito do trabalho
Palavras-chave em Inglês:	Named entity recognition Natural language processing Deep learning Neural networks Portuguese language Labor law
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	http://repositorio.bc.ufg.br/tede/handle/tede/10276
Resumo:	Named Entity Recognition (NER) is a challenging Natural Language Processing task for a language as rich as Portuguese. When applied to a specific domain, the task acquires a new layer of complexity, handling a lexicon particular to the domain in question. In this work, it is studied the Legal domain, targeting specifically the Brazilian Labor Law. Architectures based on Deep Learning, with word representations based on static word embeddings and language models have shown state-of-the-art performance for the NER task. In this work it is used a model based on Deep Neural Networks, evaluating different forms of word representations. The evaluated models are applied to Portuguese language, for both Legal and general domains. To this end, language models based on the ELMo architecture were trained for both domains, as well as static word embeddings, specific for the Legal domain. In this work, it is verified the best type of pre-trained word embeddings for each domain, after performing a comparative study between the types of word embeddings applied to the NER task. For the training of the Legal domain NER models, ELMo and static word embeddings, two different corpora were produced and annotated, based on a collection of public documents from the Brazilian Labor Court. For the Portuguese general domain NER model, a new state-of-the-art result was achieved for the HAREM benchmark, with 83.22% F-Score for the selective scenario, and 78.04% for the total scenario. For the Brazilian Labor Law domain, a model with 93.81% F-Score was obtained.

Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico

Registros relacionados