Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study
Main Author: | |
---|---|
Publication Date: | 2023 |
Other Authors: | , |
Format: | Article |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | https://hdl.handle.net/10316/117486 https://doi.org/10.1109/ISSRE59848.2023.00021 |
Summary: | Online Failure Prediction (OFP) is a technique that attempts to predict incoming failures to mitigate their consequences. Machine Learning (ML) has been successfully used to create predictive models for OFP, but failures are rare, and thus failure data are typically not available. Fault injection has been accepted as a viable alternative to generate failure data. However, this raises several challenges, such as how to process the data to create and assess predictive models. The charac teristics of fault injection campaigns (e.g., repeated/controlled experiments) and OFP (e.g., autocorrelation) require specific considerations. This work proposes a six-stage methodology for using failure data generated through fault injection to create accurate/representative models for OFP. It overviews the various phases, from generating and processing the data, to creating and assessing the performance of the models up to their deployment, while considering the intrinsic characteristics of the problem. As a case study, we apply the methodology to develop failure predictors for the Linux Operating System (OS). Results show that the proposed methodology led to accurate predictive models that could also generalize to failures that occur under different execution profiles, whilst using traditional techniques resulted in over-optimistic observations. |
id |
RCAP_fc7b09d52464e8c4134e20a6769abb22 |
---|---|
oai_identifier_str |
oai:estudogeral.uc.pt:10316/117486 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case StudyDependabilityFailure PredictionFault InjectionMachine LearningOnline Failure Prediction (OFP) is a technique that attempts to predict incoming failures to mitigate their consequences. Machine Learning (ML) has been successfully used to create predictive models for OFP, but failures are rare, and thus failure data are typically not available. Fault injection has been accepted as a viable alternative to generate failure data. However, this raises several challenges, such as how to process the data to create and assess predictive models. The charac teristics of fault injection campaigns (e.g., repeated/controlled experiments) and OFP (e.g., autocorrelation) require specific considerations. This work proposes a six-stage methodology for using failure data generated through fault injection to create accurate/representative models for OFP. It overviews the various phases, from generating and processing the data, to creating and assessing the performance of the models up to their deployment, while considering the intrinsic characteristics of the problem. As a case study, we apply the methodology to develop failure predictors for the Linux Operating System (OS). Results show that the proposed methodology led to accurate predictive models that could also generalize to failures that occur under different execution profiles, whilst using traditional techniques resulted in over-optimistic observations.This work has been partially supported by Project “Agenda Mobilizadora Sines Nexus”. ref. No. 7113), supported by the Recovery and Resilience Plan (PRR) and by the European Funds Next Generation EU, following Notice No. 02/C05-i01/2022, Component 5 - Capitalization and Business Innovation - Mobilizing Agendas for Business Innovation and by the FCT, I.P./MCTES through national funds (PIDDAC), within the scope of CISUC R&D Unit – UIDB/00326/2020 or project code UIDP/00326/2020.IEEE2023info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/117486https://hdl.handle.net/10316/117486https://doi.org/10.1109/ISSRE59848.2023.00021eng979-8-3503-1594-3https://ieeexplore.ieee.org/document/10301236Campos, João R.Costa, ErnestoVieira, Marcoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-12-30T09:48:36Zoai:estudogeral.uc.pt:10316/117486Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:11:25.688907Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study |
title |
Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study |
spellingShingle |
Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study Campos, João R. Dependability Failure Prediction Fault Injection Machine Learning |
title_short |
Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study |
title_full |
Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study |
title_fullStr |
Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study |
title_full_unstemmed |
Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study |
title_sort |
Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study |
author |
Campos, João R. |
author_facet |
Campos, João R. Costa, Ernesto Vieira, Marco |
author_role |
author |
author2 |
Costa, Ernesto Vieira, Marco |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Campos, João R. Costa, Ernesto Vieira, Marco |
dc.subject.por.fl_str_mv |
Dependability Failure Prediction Fault Injection Machine Learning |
topic |
Dependability Failure Prediction Fault Injection Machine Learning |
description |
Online Failure Prediction (OFP) is a technique that attempts to predict incoming failures to mitigate their consequences. Machine Learning (ML) has been successfully used to create predictive models for OFP, but failures are rare, and thus failure data are typically not available. Fault injection has been accepted as a viable alternative to generate failure data. However, this raises several challenges, such as how to process the data to create and assess predictive models. The charac teristics of fault injection campaigns (e.g., repeated/controlled experiments) and OFP (e.g., autocorrelation) require specific considerations. This work proposes a six-stage methodology for using failure data generated through fault injection to create accurate/representative models for OFP. It overviews the various phases, from generating and processing the data, to creating and assessing the performance of the models up to their deployment, while considering the intrinsic characteristics of the problem. As a case study, we apply the methodology to develop failure predictors for the Linux Operating System (OS). Results show that the proposed methodology led to accurate predictive models that could also generalize to failures that occur under different execution profiles, whilst using traditional techniques resulted in over-optimistic observations. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10316/117486 https://hdl.handle.net/10316/117486 https://doi.org/10.1109/ISSRE59848.2023.00021 |
url |
https://hdl.handle.net/10316/117486 https://doi.org/10.1109/ISSRE59848.2023.00021 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
979-8-3503-1594-3 https://ieeexplore.ieee.org/document/10301236 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
IEEE |
publisher.none.fl_str_mv |
IEEE |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833602607490269184 |