A Hybrid Machine Learning System for Vulnerability Detection in Web Applications
| Main Author: | |
|---|---|
| Publication Date: | 2023 |
| Format: | Master thesis |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | http://hdl.handle.net/10451/63629 |
Summary: | Tese de mestrado, Ciências de Dados, 2023, Universidade de Lisboa, Faculdade de Ciências |
| id |
RCAP_41bd93c6b8100dd7113f7cc4ce591309 |
|---|---|
| oai_identifier_str |
oai:repositorio.ulisboa.pt:10451/63629 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
A Hybrid Machine Learning System for Vulnerability Detection in Web Applicationsdeteção de vulnerabilidades de webaprendizagem automáticadetecção de anomaliasprocessamento de linguagem naturalsegurança de softwareTeses de mestrado - 2024Departamento de InformáticaTese de mestrado, Ciências de Dados, 2023, Universidade de Lisboa, Faculdade de CiênciasSecurity in web applications is often compromised by poorly written code that is exploited by attackers. Source code vulnerability detection tools have been developed using static analysis and machine learning techniques. The best performing tools seek for very low false negative rates along with acceptable false positives. Static analysis requires manual programming to identify vulnerabilities, depends on human expertise and is usually limited to a specific programming language. On the other hand, classical supervised machine learning approaches previously used may be limited to identify zero-day vulnerabilities or prone to overfit due to limited available datasets. This dissertation aims to develop a hybrid machine learning (ML) system for vulnerability detection of web applications. The system developed will use a combination of static analysis and Natural Language Processing (NLP) techniques to identify functions related to vulnerabilities that will be used to build representative datasets. The datasets will be used as input for unsupervised machine learning and other behaviour based anomaly detection algorithms in order to signalize as suspicious the code snippets under analysis. For these source code snippets, the system will aim to confirm which are vulnerable and identify the type of vulnerability via supervised machine learning techniques. The dissertation explores a novel approach to vulnerability detection by combining unsupervised anomaly detection models with supervised machine learning and Natural Language Processing techniques. Previous research in vulnerability detection has primarily focused on either unsupervised or supervised methods, neglecting the potential benefits of a hybrid approach. The goal of this research is to investigate the efficacy of hybrid architectures in identifying software vulnerabilities and to determine the optimal machine learning models and datasets for this purpose. The proposed hybrid model consists of different layers. The first uses a One Class Support Vector Machine model (OCSVM) to detect anomalies, the second employs a Random Forest Model to confirm the presence of vulnerabilities on the anomalies. The type of vulnerability is classified by a Logistic Regression Model that relies on the Doc2Vec model for feature extraction. The research includes experimentation with various machine learning models and datasets, evaluating simple binary features to more complex Doc2Vec embeddings. The thesis demonstrates OCSVM’s suitability for semi-unsupervised anomaly detection, yielding promising results across various datasets. Additionally, the study assesses Random Forests’ effectiveness in classifying vulnerable source code snippets based on OCSVMdetected anomalies and validate the use NLP techniques for feature extraction of sourcecode snippets. Overall, the proposed hybrid model achieved an accuracy of 65%. Although these results seems to be low, this research offers a promising hybrid approach to vulnerability detection, leveraging the strengths of unsupervised and supervised machine learning models. The findings suggest opportunities for further enhancements and optimizations, paving the way for more effective software vulnerability detection systems.Medeiros, Ibéria Vitória de Sousa, 1971-Repositório da Universidade de LisboaOliveira, Miguel César de Albuquerque2024-03-21T10:29:12Z202420232024-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10451/63629enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T15:13:12Zoai:repositorio.ulisboa.pt:10451/63629Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T03:36:56.608151Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
A Hybrid Machine Learning System for Vulnerability Detection in Web Applications |
| title |
A Hybrid Machine Learning System for Vulnerability Detection in Web Applications |
| spellingShingle |
A Hybrid Machine Learning System for Vulnerability Detection in Web Applications Oliveira, Miguel César de Albuquerque deteção de vulnerabilidades de web aprendizagem automática detecção de anomalias processamento de linguagem natural segurança de software Teses de mestrado - 2024 Departamento de Informática |
| title_short |
A Hybrid Machine Learning System for Vulnerability Detection in Web Applications |
| title_full |
A Hybrid Machine Learning System for Vulnerability Detection in Web Applications |
| title_fullStr |
A Hybrid Machine Learning System for Vulnerability Detection in Web Applications |
| title_full_unstemmed |
A Hybrid Machine Learning System for Vulnerability Detection in Web Applications |
| title_sort |
A Hybrid Machine Learning System for Vulnerability Detection in Web Applications |
| author |
Oliveira, Miguel César de Albuquerque |
| author_facet |
Oliveira, Miguel César de Albuquerque |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Medeiros, Ibéria Vitória de Sousa, 1971- Repositório da Universidade de Lisboa |
| dc.contributor.author.fl_str_mv |
Oliveira, Miguel César de Albuquerque |
| dc.subject.por.fl_str_mv |
deteção de vulnerabilidades de web aprendizagem automática detecção de anomalias processamento de linguagem natural segurança de software Teses de mestrado - 2024 Departamento de Informática |
| topic |
deteção de vulnerabilidades de web aprendizagem automática detecção de anomalias processamento de linguagem natural segurança de software Teses de mestrado - 2024 Departamento de Informática |
| description |
Tese de mestrado, Ciências de Dados, 2023, Universidade de Lisboa, Faculdade de Ciências |
| publishDate |
2023 |
| dc.date.none.fl_str_mv |
2023 2024-03-21T10:29:12Z 2024 2024-01-01T00:00:00Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/63629 |
| url |
http://hdl.handle.net/10451/63629 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833601766250250240 |