NLP in Brazilian banking results: comparison between traditional topic modeling techniques and LLMS

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: Montes, Bruna Fontenele Amorim
Orientador(a): Souza, Renato Rocha
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://hdl.handle.net/10438/36695
Resumo: This study evaluates the application of Natural Language Processing (NLP) techniques for analyzing quarterly earnings call transcripts from Brazilian banks, with a focus on comparing traditional topic modeling methods—Latent Dirichlet Allocation (LDA) and BERTopic—with advanced Large Language Models (LLMs), including GPT-4-turbo, Llama3, and Qwen2. The research is structured into three benchmark tasks: (1) comparing traditional NLP methods with GPT-4-turbo for unstructured topic modeling, (2) benchmarking GPT-4-turbo, Llama3, and Qwen2 in unstructured topic modeling using the innovative "LLM-as-a-Judge" framework, and (3) evaluating LLMs for structured topic modeling and sentiment analysis using labeled datasets. The results reveal the limitations of traditional models in capturing nuanced, domain-specific content due to their reliance on bag-of-words and clustering techniques, particularly in small, homogeneous datasets. Conversely, LLMs demonstrated superior performance, leveraging pre-trained architectures to generate contextually rich and coherent outputs without requiring dataset-specific training. Among the LLMs, GPT-4-turbo consistently outperformed others across tasks, achieving higher scores in coherence, accuracy, and contextual relevance. Open-source models like Qwen2 showed promise as resource-efficient alternatives, though with reduced consistency compared to GPT-4-turbo. The study highlights the evolving methodologies for evaluating modern NLP models, emphasizing the inadequacy of traditional metrics like coherence and UMass scores for assessing LLM outputs. By incorporating a hybrid evaluation approach—combining structured benchmarks, qualitative assessments, and the "LLM-as-a-Judge" framework—this research provides a comprehensive method for model comparison. These findings underline the transformative potential of LLMs in domain-specific applications and suggest pathways for future advancements in NLP evaluation techniques and model scalability.