Identificação de smells em testes fim-a-fim implementados usando a ferramenta Cypress

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Larissa de Cássia Nazaré Bicalho
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
Brasil
ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
Programa de Pós-Graduação em Ciência da Computação
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/75915
Resumo: Considering that software systems are among the most complex human constructions ever made, it is natural for a variety of errors and inconsistencies to occur. To prevent such issues from reaching end-users and causing harm, testing activities are necessary in software development projects. One of the most common methods is end-to-end testing, which aims to verify the behavior of system requirements as a whole. To implement this type of testing, developers rely on various tools such as Selenium, Cypress, ndPlaywright, among others. Despite the increasing use of these tools, few studies evaluate the bad practices associated with their use. To address this issue, this research investigated the bad practices related to the use of the Cypress framework, a JavaScript framework for end-to-end testing. Initially, a study was conducted to catalog the most common smells in such tests through a Systematic Literature Review (SLR) and a Grey Literature Review (GLR), resulting in the identification of 14 specific smells in end-to-end tests implemented with Cypress. Subsequently, methods for automatically identifying these smells were evaluated. Large Language Models (LLMs), such as ChatGPT, which are used to automate a variety of tasks, including those relevant to software development, were utilized. The ability of ChatGPT to identify these problems was assessed through a case study and a study with GitHub applications. In the controlled study, ChatGPT successfully identified 12 of the 14 cataloged smells. Eight of the smells considered in the study were detected after the first request (67%). The field study evaluated end-to-end tests implemented in three open-source systems: Pigallery2, Livewire, and lobaLeaks. The results showed that the Pigallery2 system had a precision of 0.31 and a recall of 0.62. For Livewire, the values were 0.24 for precision and 0.44 for recall. Finally, GlobaLeaks had the worst performance, with a precision of 0.15 and a recall of 0.31. The main cause for the low precision and recall rates obtained in this second study was due to inefficiency in detecting certain smells, such as Brittle Selectors. The research yielded promising results by integrating an SLR and GLR study, thus de- termining a catalog of smells for tests developed with Cypress. Regarding the detection of smells, it can be concluded that ChatGPT is not efficient in detecting them.