Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
Costa, Rosimeire Pereira da
 |
Orientador(a): |
Silva, Nádia Félix Felipe da
 |
Banca de defesa: |
Silva, Nádia Félix Felipe da,
Souza, Ellen Polliana Ramos,
Silva, Sérgio Francisco da,
Fernandes, Deborah Silva Alves |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Universidade Federal de Goiás
|
Programa de Pós-Graduação: |
Programa de Pós-graduação em Ciência da Computação (INF)
|
Departamento: |
Instituto de Informática - INF (RMG)
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://repositorio.bc.ufg.br/tede/handle/tede/12862
|
Resumo: |
Named Entity Recognition (NER) is a challenging task in Natural Language Processing (NLP) for a language as rich as Portuguese. When applied in a scenario appropriate to informal language and short texts, the task acquires a new layer of complexity, manipulating a lexicon specific to the domain in question. In this work, we expand the UlyssesNER-Br corpus for the NER task with Brazilian Portuguese comments on bill projects. Additionally, we enriched the annotated set with a formal corpus in order to analyze whether the combination of formal and informal texts from the same domain could improve the performance of NER models. Finally, we conducted experiments with a Conditional Random Fields (CRF) model, a Bidirectional LSTM-CRF model (BiLSTM-CRF), and subsequently fine-tuned a BERT and RoBERTa language model on the NER task with our dataset. We conclude that formal texts aided in identifying entities in informal texts. The best model was the fine-tuning of BERT which achieved an F1- score of 74.63%, surpassing the benchmark of related works. |