Automated vulnerability detection in source code

Detalhes bibliográficos
Autor(a) principal: Batista, Arthur Quintino
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: http://hdl.handle.net/10400.22/23876
Resumo: Technological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches.
id RCAP_d42a3ca91de4de3661e7d098ce7b25e4
oai_identifier_str oai:recipp.ipp.pt:10400.22/23876
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Automated vulnerability detection in source codeVulnerability DetectionMachine LearningBinary ClassificationMulti-Label ClassificationNeural NetworkSource-CodeData ProcessingComputer SecurityCWEDeteção de VulnerabilidadesClassificação BináriaClassificação Multi-EtiquetaRede NeuralCódigo-FonteProcessamento de DadosSegurança InformáticaTechnological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches.Pereira, Isabel Cecília Correia da Silva Praça GomesREPOSITÓRIO P.PORTOBatista, Arthur Quintino2023-11-09T12:03:26Z20232023-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/23876urn:tid:203380290enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-07T10:27:00Zoai:recipp.ipp.pt:10400.22/23876Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T00:54:58.039689Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Automated vulnerability detection in source code
title Automated vulnerability detection in source code
spellingShingle Automated vulnerability detection in source code
Batista, Arthur Quintino
Vulnerability Detection
Machine Learning
Binary Classification
Multi-Label Classification
Neural Network
Source-Code
Data Processing
Computer Security
CWE
Deteção de Vulnerabilidades
Classificação Binária
Classificação Multi-Etiqueta
Rede Neural
Código-Fonte
Processamento de Dados
Segurança Informática
title_short Automated vulnerability detection in source code
title_full Automated vulnerability detection in source code
title_fullStr Automated vulnerability detection in source code
title_full_unstemmed Automated vulnerability detection in source code
title_sort Automated vulnerability detection in source code
author Batista, Arthur Quintino
author_facet Batista, Arthur Quintino
author_role author
dc.contributor.none.fl_str_mv Pereira, Isabel Cecília Correia da Silva Praça Gomes
REPOSITÓRIO P.PORTO
dc.contributor.author.fl_str_mv Batista, Arthur Quintino
dc.subject.por.fl_str_mv Vulnerability Detection
Machine Learning
Binary Classification
Multi-Label Classification
Neural Network
Source-Code
Data Processing
Computer Security
CWE
Deteção de Vulnerabilidades
Classificação Binária
Classificação Multi-Etiqueta
Rede Neural
Código-Fonte
Processamento de Dados
Segurança Informática
topic Vulnerability Detection
Machine Learning
Binary Classification
Multi-Label Classification
Neural Network
Source-Code
Data Processing
Computer Security
CWE
Deteção de Vulnerabilidades
Classificação Binária
Classificação Multi-Etiqueta
Rede Neural
Código-Fonte
Processamento de Dados
Segurança Informática
description Technological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches.
publishDate 2023
dc.date.none.fl_str_mv 2023-11-09T12:03:26Z
2023
2023-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.22/23876
urn:tid:203380290
url http://hdl.handle.net/10400.22/23876
identifier_str_mv urn:tid:203380290
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833600754983632896