ToMAS: Sumarização abstrativa multinível baseada em tópicos usando LLMs

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: Capellaro, Leonardo
Orientador(a): Caseli, Helena de Medeiros lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
LLM
X
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/21352
Resumo: Recently, political discussions in Brazil have gained prominence, becoming one of the most debated topics on social media. Factors such as the diversity of themes, the occurrence of events that attract public attention, and the constant increase in the volume of messages have made it challenging to identify in a clear, concise, and objective manner the main topics of the posts. In this context, this study proposes a new automated method that explores the use of automatic topic generation and multi-level summarization techniques using large-scale language models. The proposed method was evaluated to generate summaries from tweets about Brazilian politics collected during the 2022 presidential elections. In addition to the multi-level summarization method, a new summary evaluation method was also proposed, based on the divide and conquer strategy for applying the BERTScore automatic evaluation measure to lengthy texts, which also incorporates the generated sentence size as a weight in the evaluation score. Qualitative and quantitative analyses indicate that the combination of these techniques was able to successfully extract and summarize the main topics, demonstrating great potential to be a useful informative tool in the assessment of different opinions, issues, and topics publicly discussed.