Reconhecimento de entidades nomeadas em textos informais no domínio legislativo

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Costa, Rosimeire Pereira da lattes
Orientador(a): Silva, Nádia Félix Felipe da lattes
Banca de defesa: Silva, Nádia Félix Felipe da, Souza, Ellen Polliana Ramos, Silva, Sérgio Francisco da, Fernandes, Deborah Silva Alves
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Goiás
Programa de Pós-Graduação: Programa de Pós-graduação em Ciência da Computação (INF)
Departamento: Instituto de Informática - INF (RMG)
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://repositorio.bc.ufg.br/tede/handle/tede/12862
Resumo: Named Entity Recognition (NER) is a challenging task in Natural Language Processing (NLP) for a language as rich as Portuguese. When applied in a scenario appropriate to informal language and short texts, the task acquires a new layer of complexity, manipulating a lexicon specific to the domain in question. In this work, we expand the UlyssesNER-Br corpus for the NER task with Brazilian Portuguese comments on bill projects. Additionally, we enriched the annotated set with a formal corpus in order to analyze whether the combination of formal and informal texts from the same domain could improve the performance of NER models. Finally, we conducted experiments with a Conditional Random Fields (CRF) model, a Bidirectional LSTM-CRF model (BiLSTM-CRF), and subsequently fine-tuned a BERT and RoBERTa language model on the NER task with our dataset. We conclude that formal texts aided in identifying entities in informal texts. The best model was the fine-tuning of BERT which achieved an F1- score of 74.63%, surpassing the benchmark of related works.