Vulnerable Code Detection Using Software Metrics and Machine Learning

Detalhes bibliográficos
Autor(a) principal: Medeiros, Nadia
Data de Publicação: 2020
Outros Autores: Ivaki, Naghmeh, Costa, Pedro, Vieira, Marco
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: https://hdl.handle.net/10316/101274
https://doi.org/10.1109/ACCESS.2020.3041181
Resumo: Software metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/CCC (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and howcan machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of con dence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable).
id RCAP_02ad9ff99dde0d9c1c71c08944acdb95
oai_identifier_str oai:estudogeral.uc.pt:10316/101274
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Vulnerable Code Detection Using Software Metrics and Machine LearningApplication scenariosmachine learningsoftware metricssoftware securitysecurity vulnerabilitiesSoftware metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/CCC (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and howcan machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of con dence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable).2020info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/101274https://hdl.handle.net/10316/101274https://doi.org/10.1109/ACCESS.2020.3041181eng2169-3536Medeiros, NadiaIvaki, NaghmehCosta, PedroVieira, Marcoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2022-08-19T20:39:21Zoai:estudogeral.uc.pt:10316/101274Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T05:50:43.558013Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Vulnerable Code Detection Using Software Metrics and Machine Learning
title Vulnerable Code Detection Using Software Metrics and Machine Learning
spellingShingle Vulnerable Code Detection Using Software Metrics and Machine Learning
Medeiros, Nadia
Application scenarios
machine learning
software metrics
software security
security vulnerabilities
title_short Vulnerable Code Detection Using Software Metrics and Machine Learning
title_full Vulnerable Code Detection Using Software Metrics and Machine Learning
title_fullStr Vulnerable Code Detection Using Software Metrics and Machine Learning
title_full_unstemmed Vulnerable Code Detection Using Software Metrics and Machine Learning
title_sort Vulnerable Code Detection Using Software Metrics and Machine Learning
author Medeiros, Nadia
author_facet Medeiros, Nadia
Ivaki, Naghmeh
Costa, Pedro
Vieira, Marco
author_role author
author2 Ivaki, Naghmeh
Costa, Pedro
Vieira, Marco
author2_role author
author
author
dc.contributor.author.fl_str_mv Medeiros, Nadia
Ivaki, Naghmeh
Costa, Pedro
Vieira, Marco
dc.subject.por.fl_str_mv Application scenarios
machine learning
software metrics
software security
security vulnerabilities
topic Application scenarios
machine learning
software metrics
software security
security vulnerabilities
description Software metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/CCC (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and howcan machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of con dence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable).
publishDate 2020
dc.date.none.fl_str_mv 2020
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10316/101274
https://hdl.handle.net/10316/101274
https://doi.org/10.1109/ACCESS.2020.3041181
url https://hdl.handle.net/10316/101274
https://doi.org/10.1109/ACCESS.2020.3041181
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2169-3536
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602491720138752