An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools

Detalhes bibliográficos
Autor(a) principal: Pereira, Jose D'Abruzzo
Data de Publicação: 2019
Outros Autores: Campos, João R., Vieira, Marco
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: https://hdl.handle.net/10316/117479
https://doi.org/10.1109/LADC48089.2019.8995685
Resumo: Due to time-to-market needs and cost of manual validation techniques, software systems are often deployed with vulnerabilities that may be exploited to gain illegitimate access/control, ultimately resulting in non-negligible consequences. Static Analysis Tools (SATs) are widely used for vulnerability detection, where the source code is analyzed without executing it. However, the performance of SATs varies considerably and a high detection rate usually comes with significant false alarms. Recent studies considered combining various SATs to improve the overall detection ability, but they do not allow exploring different performance trade-offs, as basic and rigid rules are normally followed. Machine Learning (ML) algorithms have shown promising results in several complex problems, due to their ability to fit specific needs. This paper presents an exploratory study on the combination of the output of SATs through ML algorithms to improve vulnerability detection while trying to reduce false alarms. The dataset consists of SQL Injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities detected by five different SATs in a large set of WordPress plugins developed in PHP. Results show that, for the case of SQLi, a false alarm reduction is possible without compromising the vulnerabilities detected, and that using ML allows trade-offs (e.g., reduction in false alarms at the expense of a few vulnerabilities) that are not possible with existing techniques. The paper also proposes a regression-based approach for ranking source code files considering estimates of vulnerabilities computed using the output of SATs. Results show that the approach allows creating a ranking of the source code files that largely overlaps the real ranking (based on real known vulnerabilities).
id RCAP_9ec7f1ad151856a77a682f0281a5511a
oai_identifier_str oai:estudogeral.uc.pt:10316/117479
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis ToolsSecurityVulnerability DetectionStatic Code AnalysisMachine LearningDue to time-to-market needs and cost of manual validation techniques, software systems are often deployed with vulnerabilities that may be exploited to gain illegitimate access/control, ultimately resulting in non-negligible consequences. Static Analysis Tools (SATs) are widely used for vulnerability detection, where the source code is analyzed without executing it. However, the performance of SATs varies considerably and a high detection rate usually comes with significant false alarms. Recent studies considered combining various SATs to improve the overall detection ability, but they do not allow exploring different performance trade-offs, as basic and rigid rules are normally followed. Machine Learning (ML) algorithms have shown promising results in several complex problems, due to their ability to fit specific needs. This paper presents an exploratory study on the combination of the output of SATs through ML algorithms to improve vulnerability detection while trying to reduce false alarms. The dataset consists of SQL Injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities detected by five different SATs in a large set of WordPress plugins developed in PHP. Results show that, for the case of SQLi, a false alarm reduction is possible without compromising the vulnerabilities detected, and that using ML allows trade-offs (e.g., reduction in false alarms at the expense of a few vulnerabilities) that are not possible with existing techniques. The paper also proposes a regression-based approach for ranking source code files considering estimates of vulnerabilities computed using the output of SATs. Results show that the approach allows creating a ranking of the source code files that largely overlaps the real ranking (based on real known vulnerabilities).This work was partially funded by FCT grant no. SFRH/BD/140221/2018, project ATMOSPHERE, funded by the European Commission under the Cooperation Programme, H2020 grant agreement no. 777154, and project METRICS, funded by the FCT – agreement no POCI-01-0145-FEDER-032504.IEEE2019info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/117479https://hdl.handle.net/10316/117479https://doi.org/10.1109/LADC48089.2019.8995685eng978-1-7281-6622-3https://ieeexplore.ieee.org/document/8995685Pereira, Jose D'AbruzzoCampos, João R.Vieira, Marcoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-12-27T16:22:16Zoai:estudogeral.uc.pt:10316/117479Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:11:25.154402Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools
title An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools
spellingShingle An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools
Pereira, Jose D'Abruzzo
Security
Vulnerability Detection
Static Code Analysis
Machine Learning
title_short An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools
title_full An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools
title_fullStr An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools
title_full_unstemmed An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools
title_sort An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools
author Pereira, Jose D'Abruzzo
author_facet Pereira, Jose D'Abruzzo
Campos, João R.
Vieira, Marco
author_role author
author2 Campos, João R.
Vieira, Marco
author2_role author
author
dc.contributor.author.fl_str_mv Pereira, Jose D'Abruzzo
Campos, João R.
Vieira, Marco
dc.subject.por.fl_str_mv Security
Vulnerability Detection
Static Code Analysis
Machine Learning
topic Security
Vulnerability Detection
Static Code Analysis
Machine Learning
description Due to time-to-market needs and cost of manual validation techniques, software systems are often deployed with vulnerabilities that may be exploited to gain illegitimate access/control, ultimately resulting in non-negligible consequences. Static Analysis Tools (SATs) are widely used for vulnerability detection, where the source code is analyzed without executing it. However, the performance of SATs varies considerably and a high detection rate usually comes with significant false alarms. Recent studies considered combining various SATs to improve the overall detection ability, but they do not allow exploring different performance trade-offs, as basic and rigid rules are normally followed. Machine Learning (ML) algorithms have shown promising results in several complex problems, due to their ability to fit specific needs. This paper presents an exploratory study on the combination of the output of SATs through ML algorithms to improve vulnerability detection while trying to reduce false alarms. The dataset consists of SQL Injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities detected by five different SATs in a large set of WordPress plugins developed in PHP. Results show that, for the case of SQLi, a false alarm reduction is possible without compromising the vulnerabilities detected, and that using ML allows trade-offs (e.g., reduction in false alarms at the expense of a few vulnerabilities) that are not possible with existing techniques. The paper also proposes a regression-based approach for ranking source code files considering estimates of vulnerabilities computed using the output of SATs. Results show that the approach allows creating a ranking of the source code files that largely overlaps the real ranking (based on real known vulnerabilities).
publishDate 2019
dc.date.none.fl_str_mv 2019
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10316/117479
https://hdl.handle.net/10316/117479
https://doi.org/10.1109/LADC48089.2019.8995685
url https://hdl.handle.net/10316/117479
https://doi.org/10.1109/LADC48089.2019.8995685
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 978-1-7281-6622-3
https://ieeexplore.ieee.org/document/8995685
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv IEEE
publisher.none.fl_str_mv IEEE
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602607470346240