How they relate and leave: understanding atoms of confusion in open-source java projects
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2024 |
| Tipo de documento: | Dissertação |
| Idioma: | eng |
| Título da fonte: | Repositório Institucional da Universidade Federal do Ceará (UFC) |
| Texto Completo: | http://repositorio.ufc.br/handle/riufc/76456 |
Resumo: | Software comprehension is essential to improve understanding and avoid mistakes in the software development lifecycle. Code confusion occurs when a developer and the computer reach different interpretations about the behavior of the same piece of code. Such pieces of code can be represented as small and isolated code patterns called Atoms of Confusion (ACs). In this study, we empirically investigated the effects of ACs in the software development lifecycle of 21 open-source Java projects. We built a dataset linking more than 8,000 commits, 4,000 reported issues, and 7,000 ACs from the subject projects. Our findings showed a positive correlation between the number of ACs and the number of reported bugs and improvements. We also investigated changes in commits, looking forward to gathering a better understanding of in what context ACs are removed. As each commit is linked to at least one reported issue (e.g., bug and improvement), we were able to compare the ratio of ACs removal regarding each kind of commit and use it as a proxy to indicate whether ACs are likely to be the cause behind a reported issue. We found a higher ratio of removed ACs in bug-fix and improvement commits than in the other kinds of commits (task, sub-task, new feature, wish, and test) for 14 of the 19 studied projects, which had ACs removed in commits. Finally, to support our quantitative results, we conducted a qualitative analysis to better understand how often atoms of confusion contributed to the occurrence of a bug or improvement. We inspected ACs removed in these types of commits with up to ten lines removed, analyzing the source code, messages of each involved commit, and the title, description, and comments of related Jira issues. Out of a universe of 8,641 commits from 21 analyzed projects, 391 removed ACs. Among them, 53 met the condition for our qualitative analysis. In 7 of these commits, 9 removed ACs were likely to contribute directly to the occurrence of a bug or improvement. To the best of our knowledge, our research is the first to investigate the connection between Atoms of Confusion and the source of bugs or the cause of improvements in Java projects. |
| id |
UFC-7_c5d0e0dc13deabc89d93c69b1e078f52 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufc.br:riufc/76456 |
| network_acronym_str |
UFC-7 |
| network_name_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| repository_id_str |
|
| spelling |
Pinheiro Neto, Francisco OtonRocha, Lincoln SouzaCarvalho, Windson Viana de2024-03-11T17:08:19Z2024-03-11T17:08:19Z2024PINHEIRO NETO, Francisco Oton. How they relate and leave: understanding atoms of confusion in open-source java projects. 2024. 75 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2024.http://repositorio.ufc.br/handle/riufc/76456Software comprehension is essential to improve understanding and avoid mistakes in the software development lifecycle. Code confusion occurs when a developer and the computer reach different interpretations about the behavior of the same piece of code. Such pieces of code can be represented as small and isolated code patterns called Atoms of Confusion (ACs). In this study, we empirically investigated the effects of ACs in the software development lifecycle of 21 open-source Java projects. We built a dataset linking more than 8,000 commits, 4,000 reported issues, and 7,000 ACs from the subject projects. Our findings showed a positive correlation between the number of ACs and the number of reported bugs and improvements. We also investigated changes in commits, looking forward to gathering a better understanding of in what context ACs are removed. As each commit is linked to at least one reported issue (e.g., bug and improvement), we were able to compare the ratio of ACs removal regarding each kind of commit and use it as a proxy to indicate whether ACs are likely to be the cause behind a reported issue. We found a higher ratio of removed ACs in bug-fix and improvement commits than in the other kinds of commits (task, sub-task, new feature, wish, and test) for 14 of the 19 studied projects, which had ACs removed in commits. Finally, to support our quantitative results, we conducted a qualitative analysis to better understand how often atoms of confusion contributed to the occurrence of a bug or improvement. We inspected ACs removed in these types of commits with up to ten lines removed, analyzing the source code, messages of each involved commit, and the title, description, and comments of related Jira issues. Out of a universe of 8,641 commits from 21 analyzed projects, 391 removed ACs. Among them, 53 met the condition for our qualitative analysis. In 7 of these commits, 9 removed ACs were likely to contribute directly to the occurrence of a bug or improvement. To the best of our knowledge, our research is the first to investigate the connection between Atoms of Confusion and the source of bugs or the cause of improvements in Java projects.A compreensão de programa é essencial para aprimorar o entendimento e evitar erros no ciclo de vida do desenvolvimento de software. A confusão de código ocorre quando um desenvolvedor e o computador chegam a interpretações diferentes sobre o comportamento de um mesmo trecho de código. Tais trechos de código podem ser representados como pequenos e isolados padrões de código chamados Átomos de Confusão (ACs). Neste estudo, investigamos empiricamente os efeitos dos ACs no ciclo de vida de desenvolvimento de 21 projetos Java de código aberto. Construímos um dataset que relaciona mais de 8.000 commits, 4.000 issues e 7.000 ACs dos projetos em questão. Nossos resultados demonstraram uma correlação positiva entre o número de ACs e o número de bugs e melhorias relatados. Também investigamos mudanças em commits, buscando uma compreensão mais aprofundada do contexto no qual ACs são removidos. Como cada commit está vinculado a pelo menos uma issue relatada (por exemplo, bug e melhoria), conseguimos comparar a taxa de remoção de ACs em relação a cada tipo de commit e utilizá-la como um indicador para determinar se os ACs são provavelmente a causa por trás de uma issue reportada. Encontramos uma taxa mais elevada de remoção de ACs em commits de correção de bugs e melhorias do que em outros tipos de commits (tarefa, sub-tarefa, nova funcionalidade, desejo e teste) em 14 dos 19 projetos estudados, que tiveram ACs removidos em commits. Finalmente, para apoiar nossos resultados quantitativos, conduzimos uma análise qualitativa para melhor entender com que frequência átomos de confusão contribuíram para a ocorrência de bugs ou melhorias. Analisamos ACs removidos nesses tipos de commits com até dez linhas removidas, analisando o código-fonte, mensagens de cada commit envolvido, além do título, descrição e comentários das issues relacionadas no Jira. Em um universo de 8.641 commits de 21 projetos analisados, 391 removeram ACs. Dentre eles, 53 atenderam à condição para nossa análise qualitativa. Em 7 desses commits, 9 ACs removidos provavelmente contribuíram diretamente para a ocorrência de um bug ou melhoria. Até onde sabemos, nossa pesquisa é a primeira a investigar a conexão entre Átomos de Confusão e a ocorrência de bugs ou gatilhos para melhorias em projetos Java.How they relate and leave: understanding atoms of confusion in open-source java projectsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisCompreensão de programaÁtomos de confusãoEstudo empíricoMineração de dadosProgram comprehensionAtoms of confusionEmpirical studyData miningCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOinfo:eu-repo/semantics/openAccessengreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFChttp://lattes.cnpq.br/7194447413437813http://lattes.cnpq.br/1744732999336375http://lattes.cnpq.br/06569777425905152024-03-11LICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/76456/4/license.txt8a4605be74aa9ea9d79846c1fba20a33MD54ORIGINAL2024_dis_fopinheironeto.pdf2024_dis_fopinheironeto.pdfapplication/pdf1504518http://repositorio.ufc.br/bitstream/riufc/76456/3/2024_dis_fopinheironeto.pdf5f87926f3c62e4fcd36ee928ad33b51cMD53riufc/764562024-03-11 14:08:20.105oai:repositorio.ufc.br:riufc/76456Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2024-03-11T17:08:20Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false |
| dc.title.pt_BR.fl_str_mv |
How they relate and leave: understanding atoms of confusion in open-source java projects |
| title |
How they relate and leave: understanding atoms of confusion in open-source java projects |
| spellingShingle |
How they relate and leave: understanding atoms of confusion in open-source java projects Pinheiro Neto, Francisco Oton CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Compreensão de programa Átomos de confusão Estudo empírico Mineração de dados Program comprehension Atoms of confusion Empirical study Data mining |
| title_short |
How they relate and leave: understanding atoms of confusion in open-source java projects |
| title_full |
How they relate and leave: understanding atoms of confusion in open-source java projects |
| title_fullStr |
How they relate and leave: understanding atoms of confusion in open-source java projects |
| title_full_unstemmed |
How they relate and leave: understanding atoms of confusion in open-source java projects |
| title_sort |
How they relate and leave: understanding atoms of confusion in open-source java projects |
| author |
Pinheiro Neto, Francisco Oton |
| author_facet |
Pinheiro Neto, Francisco Oton |
| author_role |
author |
| dc.contributor.co-advisor.none.fl_str_mv |
Rocha, Lincoln Souza |
| dc.contributor.author.fl_str_mv |
Pinheiro Neto, Francisco Oton |
| dc.contributor.advisor1.fl_str_mv |
Carvalho, Windson Viana de |
| contributor_str_mv |
Carvalho, Windson Viana de |
| dc.subject.cnpq.fl_str_mv |
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
| topic |
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Compreensão de programa Átomos de confusão Estudo empírico Mineração de dados Program comprehension Atoms of confusion Empirical study Data mining |
| dc.subject.ptbr.pt_BR.fl_str_mv |
Compreensão de programa Átomos de confusão Estudo empírico Mineração de dados |
| dc.subject.en.pt_BR.fl_str_mv |
Program comprehension Atoms of confusion Empirical study Data mining |
| description |
Software comprehension is essential to improve understanding and avoid mistakes in the software development lifecycle. Code confusion occurs when a developer and the computer reach different interpretations about the behavior of the same piece of code. Such pieces of code can be represented as small and isolated code patterns called Atoms of Confusion (ACs). In this study, we empirically investigated the effects of ACs in the software development lifecycle of 21 open-source Java projects. We built a dataset linking more than 8,000 commits, 4,000 reported issues, and 7,000 ACs from the subject projects. Our findings showed a positive correlation between the number of ACs and the number of reported bugs and improvements. We also investigated changes in commits, looking forward to gathering a better understanding of in what context ACs are removed. As each commit is linked to at least one reported issue (e.g., bug and improvement), we were able to compare the ratio of ACs removal regarding each kind of commit and use it as a proxy to indicate whether ACs are likely to be the cause behind a reported issue. We found a higher ratio of removed ACs in bug-fix and improvement commits than in the other kinds of commits (task, sub-task, new feature, wish, and test) for 14 of the 19 studied projects, which had ACs removed in commits. Finally, to support our quantitative results, we conducted a qualitative analysis to better understand how often atoms of confusion contributed to the occurrence of a bug or improvement. We inspected ACs removed in these types of commits with up to ten lines removed, analyzing the source code, messages of each involved commit, and the title, description, and comments of related Jira issues. Out of a universe of 8,641 commits from 21 analyzed projects, 391 removed ACs. Among them, 53 met the condition for our qualitative analysis. In 7 of these commits, 9 removed ACs were likely to contribute directly to the occurrence of a bug or improvement. To the best of our knowledge, our research is the first to investigate the connection between Atoms of Confusion and the source of bugs or the cause of improvements in Java projects. |
| publishDate |
2024 |
| dc.date.accessioned.fl_str_mv |
2024-03-11T17:08:19Z |
| dc.date.available.fl_str_mv |
2024-03-11T17:08:19Z |
| dc.date.issued.fl_str_mv |
2024 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.citation.fl_str_mv |
PINHEIRO NETO, Francisco Oton. How they relate and leave: understanding atoms of confusion in open-source java projects. 2024. 75 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2024. |
| dc.identifier.uri.fl_str_mv |
http://repositorio.ufc.br/handle/riufc/76456 |
| identifier_str_mv |
PINHEIRO NETO, Francisco Oton. How they relate and leave: understanding atoms of confusion in open-source java projects. 2024. 75 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2024. |
| url |
http://repositorio.ufc.br/handle/riufc/76456 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da Universidade Federal do Ceará (UFC) instname:Universidade Federal do Ceará (UFC) instacron:UFC |
| instname_str |
Universidade Federal do Ceará (UFC) |
| instacron_str |
UFC |
| institution |
UFC |
| reponame_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| collection |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| bitstream.url.fl_str_mv |
http://repositorio.ufc.br/bitstream/riufc/76456/4/license.txt http://repositorio.ufc.br/bitstream/riufc/76456/3/2024_dis_fopinheironeto.pdf |
| bitstream.checksum.fl_str_mv |
8a4605be74aa9ea9d79846c1fba20a33 5f87926f3c62e4fcd36ee928ad33b51c |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC) |
| repository.mail.fl_str_mv |
bu@ufc.br || repositorio@ufc.br |
| _version_ |
1847792434682003456 |