Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2021 |
| Outros Autores: | , |
| Tipo de documento: | Artigo |
| Idioma: | eng |
| Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Texto Completo: | https://hdl.handle.net/10316/117482 https://doi.org/10.1109/EDCC53658.2021.00008 |
Resumo: | Software developers can use diverse techniques and tools to reduce the number of vulnerabilities, but the effectiveness of existing solutions in real projects is questionable. For example, Static Analysis Tools (SATs) report potential vulnerabilities by analyzing code patterns, and Software Metrics (SMs) can be used to predict vulnerabilities based on high-level characteristics of the code. In theory, both approaches can be applied from the early stages of the development process, but it is well known that they fail to detect critical vulnerabilities and raise a large number of false alarms. This paper studies the hypothesis of using Machine Learning (ML) to combine alerts from SATs with SMs to predict vulnerabilities in a large software project (under development for many years). In practice, we use four ML algorithms, alerts from two SATs, and a large number of SMs to predict whether a source code file is vulnerable or not (binary classification) and to predict the vulnerability category (multiclass classification). Results show that one can achieve either high precision or high recall, but not both at the same time. To understand the reason, we analyze and compare snippets of source code, demonstrating that vulnerable and non-vulnerable files share similar characteristics, making it hard to distinguish vulnerable from non-vulnerable code based on SAT alerts and SMs. |
| id |
RCAP_22fed276583e7aa30d3e86371bf7f32c |
|---|---|
| oai_identifier_str |
oai:estudogeral.uc.pt:10316/117482 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical StudySecurityVulnerability DetectionStatic Code AnalysisSoftware MetricsSoftware developers can use diverse techniques and tools to reduce the number of vulnerabilities, but the effectiveness of existing solutions in real projects is questionable. For example, Static Analysis Tools (SATs) report potential vulnerabilities by analyzing code patterns, and Software Metrics (SMs) can be used to predict vulnerabilities based on high-level characteristics of the code. In theory, both approaches can be applied from the early stages of the development process, but it is well known that they fail to detect critical vulnerabilities and raise a large number of false alarms. This paper studies the hypothesis of using Machine Learning (ML) to combine alerts from SATs with SMs to predict vulnerabilities in a large software project (under development for many years). In practice, we use four ML algorithms, alerts from two SATs, and a large number of SMs to predict whether a source code file is vulnerable or not (binary classification) and to predict the vulnerability category (multiclass classification). Results show that one can achieve either high precision or high recall, but not both at the same time. To understand the reason, we analyze and compare snippets of source code, demonstrating that vulnerable and non-vulnerable files share similar characteristics, making it hard to distinguish vulnerable from non-vulnerable code based on SAT alerts and SMs.This work was partially funded by FCT grants 2020.04503.BD and SFRH/BD/140221/2018. This work has been partially supported by the project METRICS (reference POCI-01-0145-FEDER-032504), funded by the FCT. It is also partially supported by the project AIDA - Adaptive, Intelligent and Distributed Assurance Platform (reference POCI-01-0247-FEDER-045907) leading to this work is co-financed by the ERDF and COMPETE 2020 and by the FCT under CMU Portugal.IEEE2021info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/117482https://hdl.handle.net/10316/117482https://doi.org/10.1109/EDCC53658.2021.00008eng978-1-6654-3671-7https://ieeexplore.ieee.org/document/9603695Pereira, Jose D'AbruzzoCampos, João R.Vieira, Marcoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-12-27T16:29:06Zoai:estudogeral.uc.pt:10316/117482Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:11:25.293108Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study |
| title |
Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study |
| spellingShingle |
Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study Pereira, Jose D'Abruzzo Security Vulnerability Detection Static Code Analysis Software Metrics |
| title_short |
Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study |
| title_full |
Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study |
| title_fullStr |
Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study |
| title_full_unstemmed |
Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study |
| title_sort |
Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study |
| author |
Pereira, Jose D'Abruzzo |
| author_facet |
Pereira, Jose D'Abruzzo Campos, João R. Vieira, Marco |
| author_role |
author |
| author2 |
Campos, João R. Vieira, Marco |
| author2_role |
author author |
| dc.contributor.author.fl_str_mv |
Pereira, Jose D'Abruzzo Campos, João R. Vieira, Marco |
| dc.subject.por.fl_str_mv |
Security Vulnerability Detection Static Code Analysis Software Metrics |
| topic |
Security Vulnerability Detection Static Code Analysis Software Metrics |
| description |
Software developers can use diverse techniques and tools to reduce the number of vulnerabilities, but the effectiveness of existing solutions in real projects is questionable. For example, Static Analysis Tools (SATs) report potential vulnerabilities by analyzing code patterns, and Software Metrics (SMs) can be used to predict vulnerabilities based on high-level characteristics of the code. In theory, both approaches can be applied from the early stages of the development process, but it is well known that they fail to detect critical vulnerabilities and raise a large number of false alarms. This paper studies the hypothesis of using Machine Learning (ML) to combine alerts from SATs with SMs to predict vulnerabilities in a large software project (under development for many years). In practice, we use four ML algorithms, alerts from two SATs, and a large number of SMs to predict whether a source code file is vulnerable or not (binary classification) and to predict the vulnerability category (multiclass classification). Results show that one can achieve either high precision or high recall, but not both at the same time. To understand the reason, we analyze and compare snippets of source code, demonstrating that vulnerable and non-vulnerable files share similar characteristics, making it hard to distinguish vulnerable from non-vulnerable code based on SAT alerts and SMs. |
| publishDate |
2021 |
| dc.date.none.fl_str_mv |
2021 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10316/117482 https://hdl.handle.net/10316/117482 https://doi.org/10.1109/EDCC53658.2021.00008 |
| url |
https://hdl.handle.net/10316/117482 https://doi.org/10.1109/EDCC53658.2021.00008 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
978-1-6654-3671-7 https://ieeexplore.ieee.org/document/9603695 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.publisher.none.fl_str_mv |
IEEE |
| publisher.none.fl_str_mv |
IEEE |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833602607476637696 |