Detalhes bibliográficos
Ano de defesa: |
2024 |
Autor(a) principal: |
DEMETRIUS MOREIRA PANOVITCH |
Orientador(a): |
Bruno Magalhaes Nogueira |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Fundação Universidade Federal de Mato Grosso do Sul
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Link de acesso: |
https://repositorio.ufms.br/handle/123456789/9210
|
Resumo: |
Secondary studies aggregate relevant literature to a topic to evaluate them, provide an overview, interpret them, among other purposes. However, its development has a high cost in terms of time and resources, in addition to being subject to human bias at some stages, such as the identification of primary studies. This may compromise the quality and accuracy of the review. In this work, we propose an automated approach for one of the main steps of a secondary study: formulation and refinement of search strings. The approach, called SeSGx-BT, uses a deep learning-based algorithm, called BERTopic, to perform topic modeling on a set of studies used as a Quasi-Gold Standard. The topics are used to build search strings to be applied in a hybrid search strategy, which includes database search and snowballing strategies. The results demonstrated that SeSGx-BT is capable of finding a high number of relevant studies, and a low number of irrelevant studies in hybrid search environments, resulting in a greater recall and precision, respectively, when compared to SeSGx-LDA, a similar approach that uses LDA for topic extraction. These results suggest that deep learning-based approaches can capture topics with greater semantics, minimizing human effort in the stage of primary studies identification. Based on the precision and recall values obtaineds from experiments with 10 datasets, SeSGx-BT presents itself as a promising solution for automating the formulation and refinement of search strings for secondary studies, obtaining an increase of 270% in precision at most, and 20% on recall at most. |