Batch algorithms and fixed prediction rates for online Just-In-Time Software Defect Prediction

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: PESSOA, Dinaldo Andrade
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso embargado
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpe.br/handle/123456789/40529
Resumo: Just-In-Time Software Defect Prediction (JIT-SDP) is aimed at predicting the presence of defects in code changes at the commit time instead of inspecting modules (i.e., files or packages) in offline mode, as performed in traditional Software Defect Prediction (SDP). In a real-world application of JIT-SDP, predictions must be done in an online fashion so that the developer is informed about the presence of defect as soon as the code change is submitted, providing to the developer the opportunity to further inspect the change while it is still fresh in one’s mind. On the other hand, the model training can be done in an online or a batch fashion, since this problem domain does not have real-time requirements. Regardless the type of training, it is important to note that the code change is not labeled immediately after its submission to the source code repository. The labelling time may take days or months, depending on the time spent by the software development team to find and fix each defect. So, the model must wait some time to trust in a label of a code change. And this amount of time is known as verification latency. Another challenge faced by a JIT-SDP model is the fluctuation of the class imbalance rate through time. This kind of concept drift is known as class imbalance evolution. This work investigates the use of batch algorithms for dealing with JIT-SDP in the context of verification latency and class imbalance evolution. In comparison to the state-of-the-art, which is based on online algorithms, our approach (BORB) achieved improvements between +2% and +11% on 9 of the 10 investigated datasets, in terms of g-mean. In only one dataset, BORB achieved a result inferior to the state-of-the-art approach, a decrease of −2% in terms of g-mean. Besides that, this work investigates the predictive performance in a context in which the model is constrained to output a fixed defect prediction rate. More specifically, the defect prediction rate is an online rate that corresponds to the number of predictions which return the defect class divided by the total of predictions in a time interval. And a fixed defect prediction rate means to constraint the model to maintain the specified rate over time. That said, the results of the experiments show that, under this constraint, methods with higher capability to maintain the defect prediction rate close to the fixed defect prediction set by the hyperparameter tuning also obtain a higher predictive performance in the testing data, i.e., there is a meaningful correlation between this capability and the predictive performance. The correlation coefficient between them is 0.44. This result, added to the simplicity of the approach, suggests that a fixed defect prediction rate may be used as a standard baseline to the problem of class imbalance evolution.