Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study

Bibliographic Details
Main Author: Campos, João R.
Publication Date: 2023
Other Authors: Costa, Ernesto, Vieira, Marco
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/10316/117486
https://doi.org/10.1109/ISSRE59848.2023.00021
Summary: Online Failure Prediction (OFP) is a technique that attempts to predict incoming failures to mitigate their consequences. Machine Learning (ML) has been successfully used to create predictive models for OFP, but failures are rare, and thus failure data are typically not available. Fault injection has been accepted as a viable alternative to generate failure data. However, this raises several challenges, such as how to process the data to create and assess predictive models. The charac teristics of fault injection campaigns (e.g., repeated/controlled experiments) and OFP (e.g., autocorrelation) require specific considerations. This work proposes a six-stage methodology for using failure data generated through fault injection to create accurate/representative models for OFP. It overviews the various phases, from generating and processing the data, to creating and assessing the performance of the models up to their deployment, while considering the intrinsic characteristics of the problem. As a case study, we apply the methodology to develop failure predictors for the Linux Operating System (OS). Results show that the proposed methodology led to accurate predictive models that could also generalize to failures that occur under different execution profiles, whilst using traditional techniques resulted in over-optimistic observations.
id RCAP_fc7b09d52464e8c4134e20a6769abb22
oai_identifier_str oai:estudogeral.uc.pt:10316/117486
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case StudyDependabilityFailure PredictionFault InjectionMachine LearningOnline Failure Prediction (OFP) is a technique that attempts to predict incoming failures to mitigate their consequences. Machine Learning (ML) has been successfully used to create predictive models for OFP, but failures are rare, and thus failure data are typically not available. Fault injection has been accepted as a viable alternative to generate failure data. However, this raises several challenges, such as how to process the data to create and assess predictive models. The charac teristics of fault injection campaigns (e.g., repeated/controlled experiments) and OFP (e.g., autocorrelation) require specific considerations. This work proposes a six-stage methodology for using failure data generated through fault injection to create accurate/representative models for OFP. It overviews the various phases, from generating and processing the data, to creating and assessing the performance of the models up to their deployment, while considering the intrinsic characteristics of the problem. As a case study, we apply the methodology to develop failure predictors for the Linux Operating System (OS). Results show that the proposed methodology led to accurate predictive models that could also generalize to failures that occur under different execution profiles, whilst using traditional techniques resulted in over-optimistic observations.This work has been partially supported by Project “Agenda Mobilizadora Sines Nexus”. ref. No. 7113), supported by the Recovery and Resilience Plan (PRR) and by the European Funds Next Generation EU, following Notice No. 02/C05-i01/2022, Component 5 - Capitalization and Business Innovation - Mobilizing Agendas for Business Innovation and by the FCT, I.P./MCTES through national funds (PIDDAC), within the scope of CISUC R&D Unit – UIDB/00326/2020 or project code UIDP/00326/2020.IEEE2023info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/117486https://hdl.handle.net/10316/117486https://doi.org/10.1109/ISSRE59848.2023.00021eng979-8-3503-1594-3https://ieeexplore.ieee.org/document/10301236Campos, João R.Costa, ErnestoVieira, Marcoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-12-30T09:48:36Zoai:estudogeral.uc.pt:10316/117486Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:11:25.688907Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study
title Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study
spellingShingle Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study
Campos, João R.
Dependability
Failure Prediction
Fault Injection
Machine Learning
title_short Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study
title_full Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study
title_fullStr Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study
title_full_unstemmed Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study
title_sort Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study
author Campos, João R.
author_facet Campos, João R.
Costa, Ernesto
Vieira, Marco
author_role author
author2 Costa, Ernesto
Vieira, Marco
author2_role author
author
dc.contributor.author.fl_str_mv Campos, João R.
Costa, Ernesto
Vieira, Marco
dc.subject.por.fl_str_mv Dependability
Failure Prediction
Fault Injection
Machine Learning
topic Dependability
Failure Prediction
Fault Injection
Machine Learning
description Online Failure Prediction (OFP) is a technique that attempts to predict incoming failures to mitigate their consequences. Machine Learning (ML) has been successfully used to create predictive models for OFP, but failures are rare, and thus failure data are typically not available. Fault injection has been accepted as a viable alternative to generate failure data. However, this raises several challenges, such as how to process the data to create and assess predictive models. The charac teristics of fault injection campaigns (e.g., repeated/controlled experiments) and OFP (e.g., autocorrelation) require specific considerations. This work proposes a six-stage methodology for using failure data generated through fault injection to create accurate/representative models for OFP. It overviews the various phases, from generating and processing the data, to creating and assessing the performance of the models up to their deployment, while considering the intrinsic characteristics of the problem. As a case study, we apply the methodology to develop failure predictors for the Linux Operating System (OS). Results show that the proposed methodology led to accurate predictive models that could also generalize to failures that occur under different execution profiles, whilst using traditional techniques resulted in over-optimistic observations.
publishDate 2023
dc.date.none.fl_str_mv 2023
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10316/117486
https://hdl.handle.net/10316/117486
https://doi.org/10.1109/ISSRE59848.2023.00021
url https://hdl.handle.net/10316/117486
https://doi.org/10.1109/ISSRE59848.2023.00021
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 979-8-3503-1594-3
https://ieeexplore.ieee.org/document/10301236
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv IEEE
publisher.none.fl_str_mv IEEE
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602607490269184