Uso de LLMs no apoio à geração de strings de busca para o desenvolvimento de Estudos Secundários

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: MARIA LUÍSA DE BARROS COSTA SILVA
Orientador(a): Bruno Magalhaes Nogueira
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Fundação Universidade Federal de Mato Grosso do Sul
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
LLM
Link de acesso: https://repositorio.ufms.br/handle/123456789/11654
Resumo: Secondary Studies (SS) are a widely used methodology in the Software Engineering scientific field, since the introduction of Evidence-Based Software Engineering. The main objective of Secondary Studies is to gather all available information on a concept or phenomenon. One of the steps needed for the conduction of a SS is the definition and execution of a search strategy. One of the main strategies applied is the automated search, in order to perform this strategy, it is necessary to create and refine a search string that will be used in search engines. In the recent years, the textual technology domain has evolved greatly with the advance of Large language models (LLMs), which, through the transformers architecture and an expressive number of parameters, enable a high semantic performance combined with low complexity of use. Based on the difficulty in constructing search strings, this work proposes the creation of SeSG-LLM. SeSG-LLM is a tool based on the Alves et al. (2022) work, the Search String Generator (SeSG). The SeSGx-LLM version aims to integrate Large language models into the SeSG framework. In conclusion, the results demonstrated that LLMs can facilitate the generation of synonyms that will compose the strings, with Mistral 7B exhibiting the most consistent performance among the tested models. Additionally, the findings indicated that LDA demonstrated superior performance in the extraction of keywords.