Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem

Bibliographic Details
Main Author: Vieira, Renan Gomes
Publication Date: 2022
Format: Doctoral thesis
Language: eng
Source: Repositório Institucional da Universidade Federal do Ceará (UFC)
Download full: http://www.repositorio.ufc.br/handle/riufc/66693
Summary: Fixing bugs is a crucial aspect of software maintenance. Developers and managers must deal with many bug reports that need immediate attention despite limited resources and tight deadlines. Generally, software projects use issue tracking systems to report and monitor bug-fixing tasks. Several researchers have used this data source to conduct research and better understand the problem, providing means to reduce costs and improve efficiency in the correction task. This thesis presents three contributions to the bugs correction process. The first is a dataset and its mining script, along with a series of analyzes and visualizations. We describe the data acquisition process, the necessity to mine a new dataset, and provide a deeper analysis of some reporting fields that we use in the subsequent contributions presented in this thesis. A second contribution is a new approach to estimating the time to fix bugs. We consider the concept of bug report evolution to create a dataset containing all investigated report states. First, we check how often the bug reports and their fields are updated. Next, we evaluate our approach using different machine learning methods as a classification problem, with a number of output configurations and class balancing techniques. Using the best models (considering all possible designs) for the different stages of the evolution of a bug report, we evaluate whether there are significant differences in the estimation capacity of the models according to the report state. We gathered evidence that report fields are frequently updated, which characterizes the evolution of reports, impacting the creation of bugs fixing-time estimation models. The evaluation of the models shows promising results in predicting whether a bug will be fixed in less or more than five days, especially in the initial states of the reports. The third contribution is a study on the relationship between bug correction time and three fields: priority, links (the relationship between reports), and code-churn (related to the fixing patch associated with the bug report). Through Bayesian data analysis, we evaluated two different models - one ‘specific’ for each project and one ‘hierarchical’ considering all projects at once. We also explored three other hierarchical models to illustrate the flexibility of this type of modeling. Finally, we have gathered evidence that bug reports with links and higher values of code-churn (above the project’s median) tend to take longer to fix. On the other hand, the priority level appears to have no significant influence on the time to fix a bug.
id UFC-7_a34d8d83b2982a81c1a5b57d242c87dd
oai_identifier_str oai:repositorio.ufc.br:riufc/66693
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Vieira, Renan GomesRocha, Lincoln SouzaGomes, João Paulo Pordeus2022-06-24T19:39:06Z2022-06-24T19:39:06Z2022VIEIRA, Renan Gomes. Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem. 2022. 120 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2022.http://www.repositorio.ufc.br/handle/riufc/66693Fixing bugs is a crucial aspect of software maintenance. Developers and managers must deal with many bug reports that need immediate attention despite limited resources and tight deadlines. Generally, software projects use issue tracking systems to report and monitor bug-fixing tasks. Several researchers have used this data source to conduct research and better understand the problem, providing means to reduce costs and improve efficiency in the correction task. This thesis presents three contributions to the bugs correction process. The first is a dataset and its mining script, along with a series of analyzes and visualizations. We describe the data acquisition process, the necessity to mine a new dataset, and provide a deeper analysis of some reporting fields that we use in the subsequent contributions presented in this thesis. A second contribution is a new approach to estimating the time to fix bugs. We consider the concept of bug report evolution to create a dataset containing all investigated report states. First, we check how often the bug reports and their fields are updated. Next, we evaluate our approach using different machine learning methods as a classification problem, with a number of output configurations and class balancing techniques. Using the best models (considering all possible designs) for the different stages of the evolution of a bug report, we evaluate whether there are significant differences in the estimation capacity of the models according to the report state. We gathered evidence that report fields are frequently updated, which characterizes the evolution of reports, impacting the creation of bugs fixing-time estimation models. The evaluation of the models shows promising results in predicting whether a bug will be fixed in less or more than five days, especially in the initial states of the reports. The third contribution is a study on the relationship between bug correction time and three fields: priority, links (the relationship between reports), and code-churn (related to the fixing patch associated with the bug report). Through Bayesian data analysis, we evaluated two different models - one ‘specific’ for each project and one ‘hierarchical’ considering all projects at once. We also explored three other hierarchical models to illustrate the flexibility of this type of modeling. Finally, we have gathered evidence that bug reports with links and higher values of code-churn (above the project’s median) tend to take longer to fix. On the other hand, the priority level appears to have no significant influence on the time to fix a bug.A correção de bugs é um aspecto crucial da manutenção de software. Desenvolvedores e gerentes precisam lidar com relatórios de bugs que precisam de atenção imediata, apesar dos recursos limitados. Geralmente, projetos de software usam sistemas de rastreamento de issues como uma forma de relatar e monitorar tarefas de correção de bugs. Essas fontes de dados tem sido utilizadas por pesquisadores para conduzir estudos e melhor entender o problema, fornecendo meios para reduzir custos e aumentar a eficiência na tarefa de correção. Esta tese apresenta três contribuições para o processo de correção de bugs. A primeira é um conjunto de dados e o seu script de mineração, junto a uma série de análises e visualizações. Descrevemos o processo de aquisição, a necessidade de minerar um novo conjunto de dados, além de uma análise sobre alguns campos de relatórios que usamos nas subsequentes contribuições desenvolvidas. A segunda contribuição é uma nova abordagem para estimar o tempo de correção do bugs, onde consideramos o conceito de evolução do relatório de bug. Primeiro, verificamos com que frequência os relatórios de bug e seus campos são atualizados. A seguir, avaliamos a abordagem usando diferentes métodos de classificação de aprendizado de máquina, com distintas configurações de saída e técnicas de balanceamento de classes. Utilizando os melhores modelos testados para os diferentes estágios da evolução de um relatório, avaliamos se existem diferenças na capacidade de estimativa dos modelos segundo o estado de um relatório. Reunimos evidências de que os campos dos relatórios são atualizados com frequência, caracterizando sua a evolução, impactando nas estimativas dos modelos de predição de tempo de correção. A avaliação dos modelos mostra resultados promissores ao predizer se um bug será corrigido em menos ou mais de cinco dias, especialmente nos estados iniciais dos relatórios. A terceira contribuição é um estudo sobre a relação entre o tempo de correção de bug e três campos: prioridade, links (relação entre relatórios) e code-churn (relacionado ao patch de correção do bug). Através de análise Bayesiana de dados, avaliamos dois modelos diferentes - um ‘específico’ para cada conjunto de dados e um ‘hierárquico’ considerando todos os projetos de uma vez. Outros três modelos hierárquicos são explorados como forma de ilustrar a flexibilidade deste tipo de modelagem. Reunimos evidências de que relatórios de bug com links e valores maiores de code-churn demandam mais tempo para serem corrigidos, ao contrário de prioridade que não apresenta influência no tempo de correção.Bug reportMachine learningResolution time estimationJIRA Tracking Issue SystemBayesian data analysisContributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystemContributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosysteminfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisengreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-82152http://repositorio.ufc.br/bitstream/riufc/66693/4/license.txtfb3ad2d23d9790966439580114baefafMD54ORIGINAL2022_tese_rgvieira.pdf2022_tese_rgvieira.pdfapplication/pdf2286869http://repositorio.ufc.br/bitstream/riufc/66693/3/2022_tese_rgvieira.pdfb9cc63d5d9b6602f8fcd4a6a421c6eb1MD53riufc/666932022-06-24 16:39:06.109oai:repositorio.ufc.br:riufc/66693TElDRU7Dh0EgREUgQVJNQVpFTkFNRU5UTyBFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBIAoKQW8gY29uY29yZGFyIGNvbSBlc3RhIGxpY2Vuw6dhLCB2b2PDqihzKSBhdXRvcihlcykgb3UgdGl0dWxhcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRhIG9icmEgYXF1aSBkZXNjcml0YSBjb25jZWRlKG0pIMOgIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRvIENlYXLDoSwgZ2VzdG9yYSBkbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRkMgLSBSSS9VRkMsIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSByZXByb2R1emlyLCBjb252ZXJ0ZXIgKGNvbW8gZGVmaW5pZG8gYWJhaXhvKSBlL291IGRpc3RyaWJ1aXIgbyBkb2N1bWVudG8gZGVwb3NpdGFkbyBlbSBmb3JtYXRvIGltcHJlc3NvLCBlbGV0csO0bmljbyBvdSBlbSBxdWFscXVlciBvdXRybyBtZWlvLiBWb2PDqiBjb25jb3JkYShtKSBxdWUgYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBDZWFyw6EsIGdlc3RvcmEgZG8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZDIC0gUkkvVUZDLCBwb2RlLCBzZW0gYWx0ZXJhciBvIGNvbnRlw7pkbywgY29udmVydGVyIG8gYXJxdWl2byBkZXBvc2l0YWRvIGEgcXVhbHF1ZXIgbWVpbyBvdSBmb3JtYXRvIGNvbSBmaW5zIGRlIHByZXNlcnZhw6fDo28uIFZvY8OqKHMpIHRhbWLDqW0gY29uY29yZGEobSkgcXVlIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gQ2VhcsOhLCBnZXN0b3JhIGRvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGQyAtIFJJL1VGQywgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlc3RlIGRlcMOzc2l0byBwYXJhIGZpbnMgZGUgc2VndXJhbsOnYSwgYmFjay11cCBlL291IHByZXNlcnZhw6fDo28uIFZvY8OqIGRlY2xhcmEgcXVlIGEgYXByZXNlbnRhw6fDo28gZG8gc2V1IHRyYWJhbGhvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqKHMpIHBvZGUobSkgY29uY2VkZXIgb3MgZGlyZWl0b3MgY29udGlkb3MgbmVzdGEgbGljZW7Dp2EuIFZvY8OqIHRhbWLDqW0gZGVjbGFyYShtKSBxdWUgbyBlbnZpbyDDqSBkZSBzZXUgY29uaGVjaW1lbnRvIGUgbsOjbyBpbmZyaW5nZSBvcyBkaXJlaXRvcyBhdXRvcmFpcyBkZSBvdXRyYSBwZXNzb2Egb3UgaW5zdGl0dWnDp8Ojby4gQ2FzbyBvIGRvY3VtZW50byBhIHNlciBkZXBvc2l0YWRvIGNvbnRlbmhhIG1hdGVyaWFsIHBhcmEgbyBxdWFsIHZvY8OqKHMpIG7Do28gZGV0w6ltIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBkZSBhdXRvcmFpcywgdm9jw6oocykgZGVjbGFyYShtKSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRlIGNvbmNlZGVyIMOgIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRvIENlYXLDoSwgZ2VzdG9yYSBkbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRkMgLSBSSS9VRkMsIG9zIGRpcmVpdG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgbGljZW7Dp2EgZSBxdWUgb3MgbWF0ZXJpYWlzIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcywgZXN0w6NvIGRldmlkYW1lbnRlIGlkZW50aWZpY2Fkb3MgZSByZWNvbmhlY2lkb3Mgbm8gdGV4dG8gb3UgY29udGXDumRvIGRhIGFwcmVzZW50YcOnw6NvLgogQ0FTTyBPIFRSQUJBTEhPIERFUE9TSVRBRE8gVEVOSEEgU0lETyBGSU5BTkNJQURPIE9VIEFQT0lBRE8gUE9SIFVNIMOTUkfDg08sIFFVRSBOw4NPIEEgSU5TVElUVUnDh8ODTyBERVNURSBSRVBPU0lUw5NSSU86IFZPQ8OKIERFQ0xBUkEgVEVSIENVTVBSSURPIFRPRE9TIE9TIERJUkVJVE9TIERFIFJFVklTw4NPIEUgUVVBSVNRVUVSIE9VVFJBUyBPQlJJR0HDh8OVRVMgUkVRVUVSSURBUyBQRUxPIENPTlRSQVRPIE9VIEFDT1JETy4gCk8gcmVwb3NpdMOzcmlvIGlkZW50aWZpY2Fyw6EgY2xhcmFtZW50ZSBvIHNldShzKSBub21lKHMpIGNvbW8gYXV0b3IoZXMpIG91IHRpdHVsYXIoZXMpIGRvIGRpcmVpdG8gZGUgYXV0b3IoZXMpIGRvIGRvY3VtZW50byBzdWJtZXRpZG8gZSBkZWNsYXJhIHF1ZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvIGFsw6ltIGRhcyBwZXJtaXRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgpSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRkMuCg==Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2022-06-24T19:39:06Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.pt_BR.fl_str_mv Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem
dc.title.en.pt_BR.fl_str_mv Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem
title Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem
spellingShingle Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem
Vieira, Renan Gomes
Bug report
Machine learning
Resolution time estimation
JIRA Tracking Issue System
Bayesian data analysis
title_short Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem
title_full Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem
title_fullStr Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem
title_full_unstemmed Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem
title_sort Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem
author Vieira, Renan Gomes
author_facet Vieira, Renan Gomes
author_role author
dc.contributor.co-advisor.none.fl_str_mv Rocha, Lincoln Souza
dc.contributor.author.fl_str_mv Vieira, Renan Gomes
dc.contributor.advisor1.fl_str_mv Gomes, João Paulo Pordeus
contributor_str_mv Gomes, João Paulo Pordeus
dc.subject.por.fl_str_mv Bug report
Machine learning
Resolution time estimation
JIRA Tracking Issue System
Bayesian data analysis
topic Bug report
Machine learning
Resolution time estimation
JIRA Tracking Issue System
Bayesian data analysis
description Fixing bugs is a crucial aspect of software maintenance. Developers and managers must deal with many bug reports that need immediate attention despite limited resources and tight deadlines. Generally, software projects use issue tracking systems to report and monitor bug-fixing tasks. Several researchers have used this data source to conduct research and better understand the problem, providing means to reduce costs and improve efficiency in the correction task. This thesis presents three contributions to the bugs correction process. The first is a dataset and its mining script, along with a series of analyzes and visualizations. We describe the data acquisition process, the necessity to mine a new dataset, and provide a deeper analysis of some reporting fields that we use in the subsequent contributions presented in this thesis. A second contribution is a new approach to estimating the time to fix bugs. We consider the concept of bug report evolution to create a dataset containing all investigated report states. First, we check how often the bug reports and their fields are updated. Next, we evaluate our approach using different machine learning methods as a classification problem, with a number of output configurations and class balancing techniques. Using the best models (considering all possible designs) for the different stages of the evolution of a bug report, we evaluate whether there are significant differences in the estimation capacity of the models according to the report state. We gathered evidence that report fields are frequently updated, which characterizes the evolution of reports, impacting the creation of bugs fixing-time estimation models. The evaluation of the models shows promising results in predicting whether a bug will be fixed in less or more than five days, especially in the initial states of the reports. The third contribution is a study on the relationship between bug correction time and three fields: priority, links (the relationship between reports), and code-churn (related to the fixing patch associated with the bug report). Through Bayesian data analysis, we evaluated two different models - one ‘specific’ for each project and one ‘hierarchical’ considering all projects at once. We also explored three other hierarchical models to illustrate the flexibility of this type of modeling. Finally, we have gathered evidence that bug reports with links and higher values of code-churn (above the project’s median) tend to take longer to fix. On the other hand, the priority level appears to have no significant influence on the time to fix a bug.
publishDate 2022
dc.date.accessioned.fl_str_mv 2022-06-24T19:39:06Z
dc.date.available.fl_str_mv 2022-06-24T19:39:06Z
dc.date.issued.fl_str_mv 2022
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv VIEIRA, Renan Gomes. Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem. 2022. 120 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2022.
dc.identifier.uri.fl_str_mv http://www.repositorio.ufc.br/handle/riufc/66693
identifier_str_mv VIEIRA, Renan Gomes. Contributions to bug-fixing time estimation: an empirical study in open source projects of apache ecosystem. 2022. 120 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2022.
url http://www.repositorio.ufc.br/handle/riufc/66693
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
bitstream.url.fl_str_mv http://repositorio.ufc.br/bitstream/riufc/66693/4/license.txt
http://repositorio.ufc.br/bitstream/riufc/66693/3/2022_tese_rgvieira.pdf
bitstream.checksum.fl_str_mv fb3ad2d23d9790966439580114baefaf
b9cc63d5d9b6602f8fcd4a6a421c6eb1
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1847792572872785920