Lexicon annotation with LLM: a proof of concept with ChatGPT

Detalhes bibliográficos
Autor(a) principal: Marcondes, Francisco Supino
Data de Publicação: 2025
Outros Autores: Gala, Adelino de C.O.S., Rodrigues, Manuel, Almeida, J. J., Novais, Paulo
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: https://hdl.handle.net/1822/95171
Resumo: Lexicon annotation is a critical yet time-consuming task that can hold back the progress of language-intensive projects. This paper explores the potential of Large Language Models (LLMs) to automate lexicon annotation, traditionally performed by humans. We present a proof of concept by evaluating ChatGPT's performance on annotating VADER's sentiment lexicon. Our findings demonstrate that ChatGPT achieves fair performance in this task, suggesting that LLMs can operate as a valuable tool for initial annotations, with subsequent refinements by domain specialists. This approach could significantly accelerate lexicon development and maintenance while balancing efficiency and accuracy. Our study provides insights into the capabilities and limitations of LLMs in lexicon annotation, leading the way for further research in automating linguistic resources development.
id RCAP_fd889da75f22bd099bc4864d30732819
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/95171
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Lexicon annotation with LLM: a proof of concept with ChatGPTChatGPTLexicon annotationLLMsNLPCiências Naturais::Ciências da Computação e da InformaçãoLexicon annotation is a critical yet time-consuming task that can hold back the progress of language-intensive projects. This paper explores the potential of Large Language Models (LLMs) to automate lexicon annotation, traditionally performed by humans. We present a proof of concept by evaluating ChatGPT's performance on annotating VADER's sentiment lexicon. Our findings demonstrate that ChatGPT achieves fair performance in this task, suggesting that LLMs can operate as a valuable tool for initial annotations, with subsequent refinements by domain specialists. This approach could significantly accelerate lexicon development and maintenance while balancing efficiency and accuracy. Our study provides insights into the capabilities and limitations of LLMs in lexicon annotation, leading the way for further research in automating linguistic resources development.This work has been supported by FCT - Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.Springer, ChamUniversidade do MinhoMarcondes, Francisco SupinoGala, Adelino de C.O.S.Rodrigues, ManuelAlmeida, J. J.Novais, Paulo20252025-01-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://hdl.handle.net/1822/95171engMarcondes, F.S., Gala, A., Rodrigues, M., Almeida, J.J., Novais, P. (2025). Lexicon Annotation with LLM: A Proof of Concept with ChatGPT. In: Quintián, H., et al. Hybrid Artificial Intelligent Systems. HAIS 2024. Lecture Notes in Computer Science, vol 14858. Springer, Cham. https://doi.org/10.1007/978-3-031-74186-9_16978-3-031-74185-20302-974310.1007/978-3-031-74186-9_16978-3-031-74186-9https://link.springer.com/chapter/10.1007/978-3-031-74186-9_16info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-04-05T01:20:50Zoai:repositorium.sdum.uminho.pt:1822/95171Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:21:10.230582Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Lexicon annotation with LLM: a proof of concept with ChatGPT
title Lexicon annotation with LLM: a proof of concept with ChatGPT
spellingShingle Lexicon annotation with LLM: a proof of concept with ChatGPT
Marcondes, Francisco Supino
ChatGPT
Lexicon annotation
LLMs
NLP
Ciências Naturais::Ciências da Computação e da Informação
title_short Lexicon annotation with LLM: a proof of concept with ChatGPT
title_full Lexicon annotation with LLM: a proof of concept with ChatGPT
title_fullStr Lexicon annotation with LLM: a proof of concept with ChatGPT
title_full_unstemmed Lexicon annotation with LLM: a proof of concept with ChatGPT
title_sort Lexicon annotation with LLM: a proof of concept with ChatGPT
author Marcondes, Francisco Supino
author_facet Marcondes, Francisco Supino
Gala, Adelino de C.O.S.
Rodrigues, Manuel
Almeida, J. J.
Novais, Paulo
author_role author
author2 Gala, Adelino de C.O.S.
Rodrigues, Manuel
Almeida, J. J.
Novais, Paulo
author2_role author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Marcondes, Francisco Supino
Gala, Adelino de C.O.S.
Rodrigues, Manuel
Almeida, J. J.
Novais, Paulo
dc.subject.por.fl_str_mv ChatGPT
Lexicon annotation
LLMs
NLP
Ciências Naturais::Ciências da Computação e da Informação
topic ChatGPT
Lexicon annotation
LLMs
NLP
Ciências Naturais::Ciências da Computação e da Informação
description Lexicon annotation is a critical yet time-consuming task that can hold back the progress of language-intensive projects. This paper explores the potential of Large Language Models (LLMs) to automate lexicon annotation, traditionally performed by humans. We present a proof of concept by evaluating ChatGPT's performance on annotating VADER's sentiment lexicon. Our findings demonstrate that ChatGPT achieves fair performance in this task, suggesting that LLMs can operate as a valuable tool for initial annotations, with subsequent refinements by domain specialists. This approach could significantly accelerate lexicon development and maintenance while balancing efficiency and accuracy. Our study provides insights into the capabilities and limitations of LLMs in lexicon annotation, leading the way for further research in automating linguistic resources development.
publishDate 2025
dc.date.none.fl_str_mv 2025
2025-01-01T00:00:00Z
dc.type.driver.fl_str_mv conference paper
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/95171
url https://hdl.handle.net/1822/95171
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Marcondes, F.S., Gala, A., Rodrigues, M., Almeida, J.J., Novais, P. (2025). Lexicon Annotation with LLM: A Proof of Concept with ChatGPT. In: Quintián, H., et al. Hybrid Artificial Intelligent Systems. HAIS 2024. Lecture Notes in Computer Science, vol 14858. Springer, Cham. https://doi.org/10.1007/978-3-031-74186-9_16
978-3-031-74185-2
0302-9743
10.1007/978-3-031-74186-9_16
978-3-031-74186-9
https://link.springer.com/chapter/10.1007/978-3-031-74186-9_16
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Springer, Cham
publisher.none.fl_str_mv Springer, Cham
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602660853350400