Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal

Pinheiro, Darwin de Oliveira

Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal

Detalhes bibliográficos
Ano de defesa:	2024
Autor(a) principal:	Pinheiro, Darwin de Oliveira
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Não Informado pela instituição
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	refactoring software maintenance software quality supervised learning machine learning CNPQ: CIÊNCIAS EXATAS E DA TERRA: CIÊNCIA DA COMPUTAÇÃO
Link de acesso:	http://repositorio.ufc.br/handle/riufc/78984
Resumo:	Refactoring changes the internal structure of the code without changing its external behavior, improving quality, maintainability, and readability, in addition to reducing technical debt. Studies indicate the need to improve the detection and correction of refactorings, recommending the use of machine learning to investigate motivations, difficulties, and improvements in software. This Master’s dissertation aims to identify the relationship between trivial and non-trivial refactorings, in addition to proposing a metric that evaluates the triviality of implementing refactorings. Initially, we use supervised learning classifier models to examine the impact of trivial refactorings on the prediction of non-trivial ones. We analyzed three datasets, with 1,291 open source projects and approximately 1.9M refactoring operations, using 45 code metrics. The 5 classification models were used, in different dataset configurations. Second, we also propose an ML-based metric to evaluate the triviality of refactoring, considering complexity, speed, and risk. The study examined how the prioritization of 58 features, identified by 15 developers, affected the effectiveness of seven regression models. The effectiveness of 7 regression and ensemble models was analyzed. In addition, the alignment between the perceptions of 16 experienced developers and the results of the models was verified. Our results are promising: (i) Algorithms such as Random Forest, Decision Tree and Neural Network performed better when using code metrics to identify opportunities for refactorings; (ii) Separating trivial and non-trivial refactorings improves the efficiency of the models, even on different datasets; (iii) Using all available features outperforms the prioritization made by developers in predictive models; (iv) Ensemble models, such as Random Forest and Gradient Boosting, outperform linear models, regardless of feature prioritization; and (v) There is strong alignment between the perceptions of experts and the results of the models. In summary, this Master’s dissertation contributed to the refactoring process, an important support for developers, as it can influence the decision of whether or not to apply a refactoring. In addition, it highlights insights, challenges and opportunities for future work.

Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal

Registros relacionados