Understanding and classifying code harmfulness
Ano de defesa: | 2020 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Alagoas
Brasil Programa de Pós-Graduação em Informática UFAL |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://www.repositorio.ufal.br/handle/riufal/6966 |
Resumo: | Code smells typically indicate poor implementation choices that may degrade software quality. Hence, they need to be carefully detected to avoid such degradation. In this context, some studies try to understand the impact of code smells on the software quality, while others propose rules or machine learning-based techniques to detect code smells. However, none of those studies/techniques focus on analyzing code snippets that are really harmful to software quality. Our study aims to understand and classify code harmfulness. We analyze harmfulness in terms of CLEAN, SMELLY, BUGGY, and HARMFUL code. By harmful code, we mean code that has already harmed software quality and is still prone to harm. We perform our study with 22 smell types, 803 versions of 12 open-source projects, 40,340 bugs and 132,219 code smells. The results show that even though we have a high number of code smells, only 0.07% of those smells are harmful. The Abstract Function Call From Constructor is the smell type more related to harmful code. To cross-validate our results, we also perform a survey with 77 developers. Most of them (90.4%) consider code smells harmful to the software, and 84.6% of those developers believe that code smells detection tools are important. But, those developers are not concerned about selecting tools that are able to detect harmful code. We also evaluate machine learning techniques to classify code harmfulness: they reach the effectiveness of at least 97% to classify harmful code. While the Random Forest is effective in classifying both smelly and harmful code, the Gaussian Naive Bayes is the less effective technique. Our results also suggest that both software and developers’ metrics are important to classify harmful code. |