Vulnerable Code Detection Using Software Metrics and Machine Learning

Medeiros, Nadia; Ivaki, Naghmeh; Costa, Pedro; Vieira, Marco

Vulnerable Code Detection Using Software Metrics and Machine Learning

Detalhes bibliográficos
Autor(a) principal:	Medeiros, Nadia
Data de Publicação:	2020
Outros Autores:	Ivaki, Naghmeh, Costa, Pedro, Vieira, Marco
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo:	https://hdl.handle.net/10316/101274 https://doi.org/10.1109/ACCESS.2020.3041181
Resumo:	Software metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/CCC (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and howcan machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of con dence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable).

Metadados do item

id	RCAP_02ad9ff99dde0d9c1c71c08944acdb95
oai_identifier_str	oai:estudogeral.uc.pt:10316/101274
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Vulnerable Code Detection Using Software Metrics and Machine LearningApplication scenariosmachine learningsoftware metricssoftware securitysecurity vulnerabilitiesSoftware metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/CCC (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and howcan machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of con dence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable).2020info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/101274https://hdl.handle.net/10316/101274https://doi.org/10.1109/ACCESS.2020.3041181eng2169-3536Medeiros, NadiaIvaki, NaghmehCosta, PedroVieira, Marcoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2022-08-19T20:39:21Zoai:estudogeral.uc.pt:10316/101274Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T05:50:43.558013Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Vulnerable Code Detection Using Software Metrics and Machine Learning
title	Vulnerable Code Detection Using Software Metrics and Machine Learning
spellingShingle	Vulnerable Code Detection Using Software Metrics and Machine Learning Medeiros, Nadia Application scenarios machine learning software metrics software security security vulnerabilities
title_short	Vulnerable Code Detection Using Software Metrics and Machine Learning
title_full	Vulnerable Code Detection Using Software Metrics and Machine Learning
title_fullStr	Vulnerable Code Detection Using Software Metrics and Machine Learning
title_full_unstemmed	Vulnerable Code Detection Using Software Metrics and Machine Learning
title_sort	Vulnerable Code Detection Using Software Metrics and Machine Learning
author	Medeiros, Nadia
author_facet	Medeiros, Nadia Ivaki, Naghmeh Costa, Pedro Vieira, Marco
author_role	author
author2	Ivaki, Naghmeh Costa, Pedro Vieira, Marco
author2_role	author author author
dc.contributor.author.fl_str_mv	Medeiros, Nadia Ivaki, Naghmeh Costa, Pedro Vieira, Marco
dc.subject.por.fl_str_mv	Application scenarios machine learning software metrics software security security vulnerabilities
topic	Application scenarios machine learning software metrics software security security vulnerabilities
description	Software metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/CCC (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and howcan machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of con dence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable).
publishDate	2020
dc.date.none.fl_str_mv	2020
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/10316/101274 https://hdl.handle.net/10316/101274 https://doi.org/10.1109/ACCESS.2020.3041181
url	https://hdl.handle.net/10316/101274 https://doi.org/10.1109/ACCESS.2020.3041181
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	2169-3536
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833602491720138752

Vulnerable Code Detection Using Software Metrics and Machine Learning

Registros relacionados