Studying the prevalence of Atoms of Confusion in long-lived Java libraries

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Mendes, Wendell Militão Fernandes
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.repositorio.ufc.br/handle/riufc/69632
Resumo: Program comprehension is a fundamental activity in software maintenance and evolution, impacting several tasks such as bug fixing, code reuse and implementation of new features. The Atom of Confusion (AC) is considered the smallest piece of code that can confuse programmers, difficulting the correct understanding of the source code under consideration. Previous studies have shown that these atoms can significantly impact the presence of bugs in C/C++ programs and increase the time and effort to code understanding in C/C++ and Java programs. To gather more evidence about the diffusion of ACs in the Java ecosystem, we conduct a study to analyze the prevalence, co-occurrences (at the class level), and evolution of ACs in 27 long-lived Java libraries. To support our investigation, we developed an ACs automatic search tool called BOHR. This tool aims to: (i) aid in the identification of ACs in Java systems; (ii) provide prevalence reports of these ACs; and (iii) provide an API for the development of new custom finders to capture new ACs, as well as improve already implemented ACs identifications. BOHR is able to detect 10 of the 14 types of ACs pointed out by Langhout and Aniche (LANGHOUT; ANICHE, 2021). We also provide a dataset, manually annotated, used to validate BOHR accuracy. Using BOHR, we found 11,404 occurrences in the studied libraries. The Conditional Operator and Logic as Control Flow ACs were the most prevalent among the 10 types of ACs assessed. Our findings show that Conditional Operator and Logic as Control Flow were more likely to co-occur in the same class. Finally, we observed that the prevalence of ACs did not decrease over time. On the contrary, in 13 libraries, the presence grew proportionally more than the size of the library in lines of code. Furthermore, in 15 libraries, the fraction of Java classes containing at least one AC also increases over time.