A machine learning approach to escaped defect analysis

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: NEPOMUCENO, Késsia Thais Cavalcanti
Orientador(a): PRUDÊNCIO, Ricardo Bastos Cavalcante
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso embargado
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
Programa de Pós-Graduação: Programa de Pos Graduacao em Ciencia da Computacao
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpe.br/handle/123456789/48529
Resumo: Defects in computer systems or applications directly impact the quality and perfor- mance of a final product, generating consequences for the user and the supplier. Therefore, identifying the escaped defect not detected by the tester at the proper stage, thus, in- corporating it into the product, is one of the software industry’s primary activities. To mitigate or eliminate the missing defects, companies usually have a sector responsible for analyzing and evaluating the lost bugs to understand the context in which they are inserted and correct the flaws. The aim is to avoid repetition and improve product quality and test performance. The analysis of escaped defects also measures the testing team’s performance and the launch of new products and services. However, despite being a cru- cial activity, it requires resources such as time, equipment, training and others, making its consistent and precise application unfeasible. Because of this, in partnership with Mo- torola Mobility, we built a machine learning system to automate the analysis of escaped defects and optimize the manual process, reducing the resources invested in the stages of analysis. For this, the company provided us with information about the process, such as historical data regarding their latest analyzes performed manually by company employees. Thus, our model relies on real industry bug reports for historical data. From the Motorola Bug Report, we collected, processed and used as input to our model the data referring to the escaped and non-escaped defects, and applied Random Forest as the main classifier. As a result, we ranked the Bug Reports most likely to become an escaped defect. To measure the classifier’s performance, we used the ROC Curve and a new metric that we proposed, the cost-benefit curve. In both metrics, we obtained significant and promising results. That said, our main contributions with this work were the escaped defect analysis system and the cost-benefit curve metric that we used to measure the performance of our system. Therefore, testers in the software industry will be able to focus and direct their efforts on those Bug Reports that are more or less likely to become an escaped defect, optimizing work operation resources.