Automated vulnerability detection in source code

Batista, Arthur Quintino

Automated vulnerability detection in source code

Detalhes bibliográficos
Autor(a) principal:	Batista, Arthur Quintino
Data de Publicação:	2023
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo:	http://hdl.handle.net/10400.22/23876
Resumo:	Technological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches.

Metadados do item

id	RCAP_d42a3ca91de4de3661e7d098ce7b25e4
oai_identifier_str	oai:recipp.ipp.pt:10400.22/23876
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Automated vulnerability detection in source codeVulnerability DetectionMachine LearningBinary ClassificationMulti-Label ClassificationNeural NetworkSource-CodeData ProcessingComputer SecurityCWEDeteção de VulnerabilidadesClassificação BináriaClassificação Multi-EtiquetaRede NeuralCódigo-FonteProcessamento de DadosSegurança InformáticaTechnological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches.Pereira, Isabel Cecília Correia da Silva Praça GomesREPOSITÓRIO P.PORTOBatista, Arthur Quintino2023-11-09T12:03:26Z20232023-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/23876urn:tid:203380290enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-07T10:27:00Zoai:recipp.ipp.pt:10400.22/23876Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T00:54:58.039689Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Automated vulnerability detection in source code
title	Automated vulnerability detection in source code
spellingShingle	Automated vulnerability detection in source code Batista, Arthur Quintino Vulnerability Detection Machine Learning Binary Classification Multi-Label Classification Neural Network Source-Code Data Processing Computer Security CWE Deteção de Vulnerabilidades Classificação Binária Classificação Multi-Etiqueta Rede Neural Código-Fonte Processamento de Dados Segurança Informática
title_short	Automated vulnerability detection in source code
title_full	Automated vulnerability detection in source code
title_fullStr	Automated vulnerability detection in source code
title_full_unstemmed	Automated vulnerability detection in source code
title_sort	Automated vulnerability detection in source code
author	Batista, Arthur Quintino
author_facet	Batista, Arthur Quintino
author_role	author
dc.contributor.none.fl_str_mv	Pereira, Isabel Cecília Correia da Silva Praça Gomes REPOSITÓRIO P.PORTO
dc.contributor.author.fl_str_mv	Batista, Arthur Quintino
dc.subject.por.fl_str_mv	Vulnerability Detection Machine Learning Binary Classification Multi-Label Classification Neural Network Source-Code Data Processing Computer Security CWE Deteção de Vulnerabilidades Classificação Binária Classificação Multi-Etiqueta Rede Neural Código-Fonte Processamento de Dados Segurança Informática
topic	Vulnerability Detection Machine Learning Binary Classification Multi-Label Classification Neural Network Source-Code Data Processing Computer Security CWE Deteção de Vulnerabilidades Classificação Binária Classificação Multi-Etiqueta Rede Neural Código-Fonte Processamento de Dados Segurança Informática
description	Technological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches.
publishDate	2023
dc.date.none.fl_str_mv	2023-11-09T12:03:26Z 2023 2023-01-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10400.22/23876 urn:tid:203380290
url	http://hdl.handle.net/10400.22/23876
identifier_str_mv	urn:tid:203380290
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833600754983632896

Automated vulnerability detection in source code

Registros relacionados