Automated vulnerability detection in source code
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2023 |
| Tipo de documento: | Dissertação |
| Idioma: | eng |
| Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Texto Completo: | http://hdl.handle.net/10400.22/23876 |
Resumo: | Technological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches. |
| id |
RCAP_d42a3ca91de4de3661e7d098ce7b25e4 |
|---|---|
| oai_identifier_str |
oai:recipp.ipp.pt:10400.22/23876 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Automated vulnerability detection in source codeVulnerability DetectionMachine LearningBinary ClassificationMulti-Label ClassificationNeural NetworkSource-CodeData ProcessingComputer SecurityCWEDeteção de VulnerabilidadesClassificação BináriaClassificação Multi-EtiquetaRede NeuralCódigo-FonteProcessamento de DadosSegurança InformáticaTechnological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches.Pereira, Isabel Cecília Correia da Silva Praça GomesREPOSITÓRIO P.PORTOBatista, Arthur Quintino2023-11-09T12:03:26Z20232023-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/23876urn:tid:203380290enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-07T10:27:00Zoai:recipp.ipp.pt:10400.22/23876Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T00:54:58.039689Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Automated vulnerability detection in source code |
| title |
Automated vulnerability detection in source code |
| spellingShingle |
Automated vulnerability detection in source code Batista, Arthur Quintino Vulnerability Detection Machine Learning Binary Classification Multi-Label Classification Neural Network Source-Code Data Processing Computer Security CWE Deteção de Vulnerabilidades Classificação Binária Classificação Multi-Etiqueta Rede Neural Código-Fonte Processamento de Dados Segurança Informática |
| title_short |
Automated vulnerability detection in source code |
| title_full |
Automated vulnerability detection in source code |
| title_fullStr |
Automated vulnerability detection in source code |
| title_full_unstemmed |
Automated vulnerability detection in source code |
| title_sort |
Automated vulnerability detection in source code |
| author |
Batista, Arthur Quintino |
| author_facet |
Batista, Arthur Quintino |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Pereira, Isabel Cecília Correia da Silva Praça Gomes REPOSITÓRIO P.PORTO |
| dc.contributor.author.fl_str_mv |
Batista, Arthur Quintino |
| dc.subject.por.fl_str_mv |
Vulnerability Detection Machine Learning Binary Classification Multi-Label Classification Neural Network Source-Code Data Processing Computer Security CWE Deteção de Vulnerabilidades Classificação Binária Classificação Multi-Etiqueta Rede Neural Código-Fonte Processamento de Dados Segurança Informática |
| topic |
Vulnerability Detection Machine Learning Binary Classification Multi-Label Classification Neural Network Source-Code Data Processing Computer Security CWE Deteção de Vulnerabilidades Classificação Binária Classificação Multi-Etiqueta Rede Neural Código-Fonte Processamento de Dados Segurança Informática |
| description |
Technological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches. |
| publishDate |
2023 |
| dc.date.none.fl_str_mv |
2023-11-09T12:03:26Z 2023 2023-01-01T00:00:00Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.22/23876 urn:tid:203380290 |
| url |
http://hdl.handle.net/10400.22/23876 |
| identifier_str_mv |
urn:tid:203380290 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833600754983632896 |