Vulnerable Code Detection Using Software Metrics and Machine Learning
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2020 |
| Outros Autores: | , , |
| Tipo de documento: | Artigo |
| Idioma: | eng |
| Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Texto Completo: | https://hdl.handle.net/10316/101274 https://doi.org/10.1109/ACCESS.2020.3041181 |
Resumo: | Software metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/CCC (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and howcan machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of con dence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable). |
| id |
RCAP_02ad9ff99dde0d9c1c71c08944acdb95 |
|---|---|
| oai_identifier_str |
oai:estudogeral.uc.pt:10316/101274 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Vulnerable Code Detection Using Software Metrics and Machine LearningApplication scenariosmachine learningsoftware metricssoftware securitysecurity vulnerabilitiesSoftware metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/CCC (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and howcan machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of con dence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable).2020info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/101274https://hdl.handle.net/10316/101274https://doi.org/10.1109/ACCESS.2020.3041181eng2169-3536Medeiros, NadiaIvaki, NaghmehCosta, PedroVieira, Marcoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2022-08-19T20:39:21Zoai:estudogeral.uc.pt:10316/101274Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T05:50:43.558013Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Vulnerable Code Detection Using Software Metrics and Machine Learning |
| title |
Vulnerable Code Detection Using Software Metrics and Machine Learning |
| spellingShingle |
Vulnerable Code Detection Using Software Metrics and Machine Learning Medeiros, Nadia Application scenarios machine learning software metrics software security security vulnerabilities |
| title_short |
Vulnerable Code Detection Using Software Metrics and Machine Learning |
| title_full |
Vulnerable Code Detection Using Software Metrics and Machine Learning |
| title_fullStr |
Vulnerable Code Detection Using Software Metrics and Machine Learning |
| title_full_unstemmed |
Vulnerable Code Detection Using Software Metrics and Machine Learning |
| title_sort |
Vulnerable Code Detection Using Software Metrics and Machine Learning |
| author |
Medeiros, Nadia |
| author_facet |
Medeiros, Nadia Ivaki, Naghmeh Costa, Pedro Vieira, Marco |
| author_role |
author |
| author2 |
Ivaki, Naghmeh Costa, Pedro Vieira, Marco |
| author2_role |
author author author |
| dc.contributor.author.fl_str_mv |
Medeiros, Nadia Ivaki, Naghmeh Costa, Pedro Vieira, Marco |
| dc.subject.por.fl_str_mv |
Application scenarios machine learning software metrics software security security vulnerabilities |
| topic |
Application scenarios machine learning software metrics software security security vulnerabilities |
| description |
Software metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/CCC (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and howcan machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of con dence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable). |
| publishDate |
2020 |
| dc.date.none.fl_str_mv |
2020 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10316/101274 https://hdl.handle.net/10316/101274 https://doi.org/10.1109/ACCESS.2020.3041181 |
| url |
https://hdl.handle.net/10316/101274 https://doi.org/10.1109/ACCESS.2020.3041181 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
2169-3536 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833602491720138752 |