Detalhes bibliográficos
Ano de defesa: |
2025 |
Autor(a) principal: |
Montes, Bruna Fontenele Amorim |
Orientador(a): |
Souza, Renato Rocha |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Não Informado pela instituição
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
https://hdl.handle.net/10438/36695
|
Resumo: |
This study evaluates the application of Natural Language Processing (NLP) techniques for analyzing quarterly earnings call transcripts from Brazilian banks, with a focus on comparing traditional topic modeling methods—Latent Dirichlet Allocation (LDA) and BERTopic—with advanced Large Language Models (LLMs), including GPT-4-turbo, Llama3, and Qwen2. The research is structured into three benchmark tasks: (1) comparing traditional NLP methods with GPT-4-turbo for unstructured topic modeling, (2) benchmarking GPT-4-turbo, Llama3, and Qwen2 in unstructured topic modeling using the innovative "LLM-as-a-Judge" framework, and (3) evaluating LLMs for structured topic modeling and sentiment analysis using labeled datasets. The results reveal the limitations of traditional models in capturing nuanced, domain-specific content due to their reliance on bag-of-words and clustering techniques, particularly in small, homogeneous datasets. Conversely, LLMs demonstrated superior performance, leveraging pre-trained architectures to generate contextually rich and coherent outputs without requiring dataset-specific training. Among the LLMs, GPT-4-turbo consistently outperformed others across tasks, achieving higher scores in coherence, accuracy, and contextual relevance. Open-source models like Qwen2 showed promise as resource-efficient alternatives, though with reduced consistency compared to GPT-4-turbo. The study highlights the evolving methodologies for evaluating modern NLP models, emphasizing the inadequacy of traditional metrics like coherence and UMass scores for assessing LLM outputs. By incorporating a hybrid evaluation approach—combining structured benchmarks, qualitative assessments, and the "LLM-as-a-Judge" framework—this research provides a comprehensive method for model comparison. These findings underline the transformative potential of LLMs in domain-specific applications and suggest pathways for future advancements in NLP evaluation techniques and model scalability. |