NLP in Brazilian banking results: comparison between traditional topic modeling techniques and LLMS

Montes, Bruna Fontenele Amorim

NLP in Brazilian banking results: comparison between traditional topic modeling techniques and LLMS

Detalhes bibliográficos
Ano de defesa:	2025
Autor(a) principal:	Montes, Bruna Fontenele Amorim
Orientador(a):	Souza, Renato Rocha
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Não Informado pela instituição
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Processamento da linguagem natural (Computação) Modelos de linguagem de grande escala Bancos - Brasil
Link de acesso:	https://hdl.handle.net/10438/36695
Resumo:	This study evaluates the application of Natural Language Processing (NLP) techniques for analyzing quarterly earnings call transcripts from Brazilian banks, with a focus on comparing traditional topic modeling methods—Latent Dirichlet Allocation (LDA) and BERTopic—with advanced Large Language Models (LLMs), including GPT-4-turbo, Llama3, and Qwen2. The research is structured into three benchmark tasks: (1) comparing traditional NLP methods with GPT-4-turbo for unstructured topic modeling, (2) benchmarking GPT-4-turbo, Llama3, and Qwen2 in unstructured topic modeling using the innovative "LLM-as-a-Judge" framework, and (3) evaluating LLMs for structured topic modeling and sentiment analysis using labeled datasets. The results reveal the limitations of traditional models in capturing nuanced, domain-specific content due to their reliance on bag-of-words and clustering techniques, particularly in small, homogeneous datasets. Conversely, LLMs demonstrated superior performance, leveraging pre-trained architectures to generate contextually rich and coherent outputs without requiring dataset-specific training. Among the LLMs, GPT-4-turbo consistently outperformed others across tasks, achieving higher scores in coherence, accuracy, and contextual relevance. Open-source models like Qwen2 showed promise as resource-efficient alternatives, though with reduced consistency compared to GPT-4-turbo. The study highlights the evolving methodologies for evaluating modern NLP models, emphasizing the inadequacy of traditional metrics like coherence and UMass scores for assessing LLM outputs. By incorporating a hybrid evaluation approach—combining structured benchmarks, qualitative assessments, and the "LLM-as-a-Judge" framework—this research provides a comprehensive method for model comparison. These findings underline the transformative potential of LLMs in domain-specific applications and suggest pathways for future advancements in NLP evaluation techniques and model scalability.

NLP in Brazilian banking results: comparison between traditional topic modeling techniques and LLMS

Registros relacionados