Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço

Koxne, Cristiano

Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço

Bibliographic Details
Main Author:	Koxne, Cristiano
Publication Date:	2024
Format:	Bachelor thesis
Language:	por
Source:	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
Download full:	http://repositorio.utfpr.edu.br/jspui/handle/1/34386
Summary:	The present work focuses on the application of Artificial Intelligence (AI) methods and techniques, with an emphasis on Natural Language Processing (NLP), aiming to improve software effort estimation by analogy (EESA). The analysis of requirements texts (e.g., user stories) enables the generation of features for text classification, allowing for the automation and improvement of EESA accuracy. Effort estimation by analogy requires a deep understanding of historical data when the requirements are presented in textual format. In this context, this study proposes an intelligent process for pre-processing textual software requirements to refine effort estimates. The goal is to efficiently classify user requirements and use this categorization to infer effort estimates. The methodology adopted is inspired by previous studies, which explored pre-trained models GPT-3 and BERT for EESA. However, this study enhances the approach by focusing on the textual pre-processing method, including the application of GPT-3.5. Two distinct pre-processing strategies were outlined. The first uses a conventional method with Regular Expressions (RegEx). The second strategy explores the GPT-3.5 API to obtain a pre-processed database, using the model to produce clean and normalized texts. After pre-processing, two experiments were conducted for EESA inference, using two different methods, one via a proprietary neural network architecture and the other via the GPT-3.5 API. Metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and Median Absolute Error (MdAE) were extracted. After obtaining the metrics from the two experiments, the results were compared with those obtained in previous studies (Z-SE2 by Baratto (2022) and SE3M by Fávero, Casanova e Pimentel (2022)). Notably, the MAE value of 0.56 ± 0.15 came from the experiment that used inter-repository partitioning from a proprietary neural model.

Item metadata

id	UTFPR-12_e3a08c52cd76a652e462d5f8c8094ff6
oai_identifier_str	oai:repositorio.utfpr.edu.br:1/34386
network_acronym_str	UTFPR-12
network_name_str	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
repository_id_str
spelling	Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforçoIntelligent preprocessing of textual requirements software for effort estimation refinementProcessamento de linguagem natural (Computação)Aprendizado do computadorInteligência ArtificialLinguagem de programação (Computadores)Natural language processing (Computer science)Machine learningArtificial intelligenceProgramming languages (Electronic computers)CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOThe present work focuses on the application of Artificial Intelligence (AI) methods and techniques, with an emphasis on Natural Language Processing (NLP), aiming to improve software effort estimation by analogy (EESA). The analysis of requirements texts (e.g., user stories) enables the generation of features for text classification, allowing for the automation and improvement of EESA accuracy. Effort estimation by analogy requires a deep understanding of historical data when the requirements are presented in textual format. In this context, this study proposes an intelligent process for pre-processing textual software requirements to refine effort estimates. The goal is to efficiently classify user requirements and use this categorization to infer effort estimates. The methodology adopted is inspired by previous studies, which explored pre-trained models GPT-3 and BERT for EESA. However, this study enhances the approach by focusing on the textual pre-processing method, including the application of GPT-3.5. Two distinct pre-processing strategies were outlined. The first uses a conventional method with Regular Expressions (RegEx). The second strategy explores the GPT-3.5 API to obtain a pre-processed database, using the model to produce clean and normalized texts. After pre-processing, two experiments were conducted for EESA inference, using two different methods, one via a proprietary neural network architecture and the other via the GPT-3.5 API. Metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and Median Absolute Error (MdAE) were extracted. After obtaining the metrics from the two experiments, the results were compared with those obtained in previous studies (Z-SE2 by Baratto (2022) and SE3M by Fávero, Casanova e Pimentel (2022)). Notably, the MAE value of 0.56 ± 0.15 came from the experiment that used inter-repository partitioning from a proprietary neural model.O presente trabalho foca na aplicação de métodos e técnicas de Inteligência Artificial (IA), com foco em Processamento de Linguagem Natural (PLN), que objetivam aprimorar a estimativa de esforço de software por analogia (EESA). A análise de textos de requisitos (ex. história de usuário) possibilita a geração de características para a classificação textual, que permitem automatizar e melhorar a precisão da EESA. A estimativa de esforço por analogia exige uma compreensão profunda de dados históricos, quando os requisitos são apresentados em formato textual. Neste contexto, este estudo propõe um processo inteligente de pré-processamento de requisitos textuais de software, a fim de refinar as estimativas de esforço. O objetivo é classificar eficientemente os requisitos dos usuários e usar essa categorização para inferir estimativas de esforço. A metodologia adotada se inspira em estudos anteriores, os quais exploraram os modelos pré-treinados GPT-3 e BERT para EESA. No entanto, este estudo aprimora a abordagem, com foco no método de pré-processamento textual, incluindo a aplicação do GPT-3.5. Foram delineadas duas estratégias distintas de pré-processamento. A primeira, utiliza um método convencional, com o Regular Expression (do inglês, RegEx). A segunda estratégia explora a API do GPT-3.5 para obter uma base de dados pré-processada, utilizando o modelo para produzir textos limpos e normalizados. Após o pré-processamento, foram realizados dois experimentos para a inferência de EESA, fazendo uso de dois métodos diferentes, um via arquitetura de rede neural própria e outro via API do GPT-3.5. Desta forma, foram extraídas as métricas Erro Quadrático Médio (MSE), Erro Absoluto Médio (MAE) e Mediana dos Erros Absolutos (MdAE). Após obter as métricas dos dois experimentos, compararam-se os resultados com os obtidos nos estudos anteriores (Z-SE2 de Baratto (2022) e SE3M de Fávero, Casanova e Pimentel (2022)). Destaca-se o valor de MAE de 0,56 ± 0,15, proveniente do experimento que utilizou um particionamento entre-repositórios a partir de um modelo neural próprio.Universidade Tecnológica Federal do ParanáPato BrancoBrasilDepartamento Acadêmico de InformáticaEngenharia de ComputaçãoUTFPRFávero, Eliane Maria de BortoliDal Molin, VivianeFávero, Eliane Maria de BortoliDal Molin, VivianeAscari, Soelaine RodriguesCasanova, DalcimarKoxne, Cristiano2024-08-09T15:24:58Z2024-08-09T15:24:58Z2024-06-19info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisapplication/pdfKOXNE, Cristiano. Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço. 2024. Trabalho de Conclusão de Curso (Bacharelado em Engenharia de Computação) - Universidade Tecnológica Federal do Paraná, Pato Branco, 2024.http://repositorio.utfpr.edu.br/jspui/handle/1/34386porhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))instname:Universidade Tecnológica Federal do Paraná (UTFPR)instacron:UTFPR2024-08-10T06:10:10Zoai:repositorio.utfpr.edu.br:1/34386Repositório InstitucionalPUBhttp://repositorio.utfpr.edu.br:8080/oai/requestriut@utfpr.edu.br \|\| sibi@utfpr.edu.bropendoar:2024-08-10T06:10:10Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) - Universidade Tecnológica Federal do Paraná (UTFPR)false
dc.title.none.fl_str_mv	Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço Intelligent preprocessing of textual requirements software for effort estimation refinement
title	Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço
spellingShingle	Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço Koxne, Cristiano Processamento de linguagem natural (Computação) Aprendizado do computador Inteligência Artificial Linguagem de programação (Computadores) Natural language processing (Computer science) Machine learning Artificial intelligence Programming languages (Electronic computers) CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço
title_full	Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço
title_fullStr	Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço
title_full_unstemmed	Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço
title_sort	Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço
author	Koxne, Cristiano
author_facet	Koxne, Cristiano
author_role	author
dc.contributor.none.fl_str_mv	Fávero, Eliane Maria de Bortoli Dal Molin, Viviane Fávero, Eliane Maria de Bortoli Dal Molin, Viviane Ascari, Soelaine Rodrigues Casanova, Dalcimar
dc.contributor.author.fl_str_mv	Koxne, Cristiano
dc.subject.por.fl_str_mv	Processamento de linguagem natural (Computação) Aprendizado do computador Inteligência Artificial Linguagem de programação (Computadores) Natural language processing (Computer science) Machine learning Artificial intelligence Programming languages (Electronic computers) CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
topic	Processamento de linguagem natural (Computação) Aprendizado do computador Inteligência Artificial Linguagem de programação (Computadores) Natural language processing (Computer science) Machine learning Artificial intelligence Programming languages (Electronic computers) CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	The present work focuses on the application of Artificial Intelligence (AI) methods and techniques, with an emphasis on Natural Language Processing (NLP), aiming to improve software effort estimation by analogy (EESA). The analysis of requirements texts (e.g., user stories) enables the generation of features for text classification, allowing for the automation and improvement of EESA accuracy. Effort estimation by analogy requires a deep understanding of historical data when the requirements are presented in textual format. In this context, this study proposes an intelligent process for pre-processing textual software requirements to refine effort estimates. The goal is to efficiently classify user requirements and use this categorization to infer effort estimates. The methodology adopted is inspired by previous studies, which explored pre-trained models GPT-3 and BERT for EESA. However, this study enhances the approach by focusing on the textual pre-processing method, including the application of GPT-3.5. Two distinct pre-processing strategies were outlined. The first uses a conventional method with Regular Expressions (RegEx). The second strategy explores the GPT-3.5 API to obtain a pre-processed database, using the model to produce clean and normalized texts. After pre-processing, two experiments were conducted for EESA inference, using two different methods, one via a proprietary neural network architecture and the other via the GPT-3.5 API. Metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and Median Absolute Error (MdAE) were extracted. After obtaining the metrics from the two experiments, the results were compared with those obtained in previous studies (Z-SE2 by Baratto (2022) and SE3M by Fávero, Casanova e Pimentel (2022)). Notably, the MAE value of 0.56 ± 0.15 came from the experiment that used inter-repository partitioning from a proprietary neural model.
publishDate	2024
dc.date.none.fl_str_mv	2024-08-09T15:24:58Z 2024-08-09T15:24:58Z 2024-06-19
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/bachelorThesis
format	bachelorThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	KOXNE, Cristiano. Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço. 2024. Trabalho de Conclusão de Curso (Bacharelado em Engenharia de Computação) - Universidade Tecnológica Federal do Paraná, Pato Branco, 2024. http://repositorio.utfpr.edu.br/jspui/handle/1/34386
identifier_str_mv	KOXNE, Cristiano. Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço. 2024. Trabalho de Conclusão de Curso (Bacharelado em Engenharia de Computação) - Universidade Tecnológica Federal do Paraná, Pato Branco, 2024.
url	http://repositorio.utfpr.edu.br/jspui/handle/1/34386
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	http://creativecommons.org/licenses/by/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Tecnológica Federal do Paraná Pato Branco Brasil Departamento Acadêmico de Informática Engenharia de Computação UTFPR
publisher.none.fl_str_mv	Universidade Tecnológica Federal do Paraná Pato Branco Brasil Departamento Acadêmico de Informática Engenharia de Computação UTFPR
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) instname:Universidade Tecnológica Federal do Paraná (UTFPR) instacron:UTFPR
instname_str	Universidade Tecnológica Federal do Paraná (UTFPR)
instacron_str	UTFPR
institution	UTFPR
reponame_str	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
collection	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
repository.name.fl_str_mv	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) - Universidade Tecnológica Federal do Paraná (UTFPR)
repository.mail.fl_str_mv	riut@utfpr.edu.br \|\| sibi@utfpr.edu.br
_version_	1850498045374889984

Pré-processamento inteligente de requisitos textuais de software para refinamento de estimativa de esforço

Similar Items