Análise de performance na localização de bugs apoiada pela dissecção de conjuntos de dados
Ano de defesa: | 2022 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | eng |
Instituição de defesa: |
Universidade Federal de Uberlândia
Brasil Programa de Pós-graduação em Ciência da Computação |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://repositorio.ufu.br/handle/123456789/35654 http://doi.org/10.14393/ufu.te.2022.60 |
Resumo: | Finding and fixing software bugs still is a big challenge. These tasks demand developers as much effort and experience as required to develop new functionality. Last decades, the research community actively produced approaches to support the debugging process. The Bug Localization (BL) task is an essential step, wherever is the applied software repair approach (automated or manual). However, automated techniques for BL are critical in turning the process more effective and efficient. There are many approaches to automated BL, and all of them have one frequent goal: to improve accuracy performance in classifying software components suspected of containing bugs. One recurrent issue is the lack of clarity about the reasons for the success or failure of the approaches on the assessed bug dataset since most methods do not consider the nature and intrinsic characteristics of the bugs. The discussion is still too focused on performance gains compared to the previous state-of-the-art. This work aims to contribute to software repair tasks, primarily focusing on supporting the automated BL. First, we explored characteristics of bugs usually applied in the assessment of the localization strategies (also extended to automated program repair). Then, we analyze the relationships between these bug characteristics and their influence on the performance of localization strategies. We start from a static information-based BL approach, based in LtR algorithms, having bug reports as input to the localization process. Initially, we analyze a well-known bug dataset, Defects4J, from where we extract various bugs characteristics. Next, we analyzed these characteristics in a larger dataset referred to as LR-dataset. Then, we raise various strategies and alternatives to improve the ranking of suspect buggy files and generated by BL approaches. Some examples are the use of new features (e.g., Code Entropy), the tuning of hyperparameters and the data balance for training in Machine Learning (ML) based approaches, and, finally, bugs' sampling guided by patch analysis. For that, we tested the alternatives to improve the ranking of suspected components with an environment built for experimenting with and reproducing the BL strategies. We show that pre-processing strategies on bug reports and also on the dataset, besides the tuning of different LtR algorithms, can produce different ranking results even with past BL approaches. Still, characteristics of the bugs sampled for assessment can influence ranking scores of buggy suspected files, e.g., depending on the type of associated repair patterns and repair actions required to fix the bugs. For example, this is the case for the Missing Not-Null Check repair pattern whose presence in an experimental sample produces a suspicious score ranking 27.22 percentual points above the baseline when we do not consider the presence (or absence) of the pattern. These results point to opportunities to review the BL past approaches under the lens of dataset dissection applied in the assessment and with a potential to new insights, interpretations, and compositions of strategies for BL. |