Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor

Bibliographic Details
Main Author: Cerveira, Frederico
Publication Date: 2021
Other Authors: Barbosa, Raul, Madeira, Henrique
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/10316/100878
https://doi.org/10.1109/ACCESS.2021.3098644
Summary: Many organizations are moving their systems to the cloud, where providers consolidate multiple clients using virtualization, which creates challenges to business-critical applications. Research has shown that hypervisors fail, often causing common-mode failures that may abruptly disrupt dozens of virtual machines simultaneously. We hypothesize and empirically show that a significant percentage of virtual machines affected by a hypervisor failure are capable of continuing execution on a new hypervisor. Supported by this observation, we design a technique for recovering from hypervisor failures through efficient virtual machine migration to a co-located hypervisor, which allows virtual machines to continue executing with minimal downtime and which can be transparently applied to existing applications. We evaluate a proofof-concept implementation using fault injection of hardware and software faults and show that it can recover, on average, 41-46% of all virtual machines, as well as having a mean virtual machine downtime of 3 seconds
id RCAP_12cd038dc228ec7f279a6aed8c08c056
oai_identifier_str oai:estudogeral.uc.pt:10316/100878
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Mitigating Virtualization Failures Through Migration to a Co-Located HypervisorCloud computingdependabilityfault injectionfault tolerancevirtualizationMany organizations are moving their systems to the cloud, where providers consolidate multiple clients using virtualization, which creates challenges to business-critical applications. Research has shown that hypervisors fail, often causing common-mode failures that may abruptly disrupt dozens of virtual machines simultaneously. We hypothesize and empirically show that a significant percentage of virtual machines affected by a hypervisor failure are capable of continuing execution on a new hypervisor. Supported by this observation, we design a technique for recovering from hypervisor failures through efficient virtual machine migration to a co-located hypervisor, which allows virtual machines to continue executing with minimal downtime and which can be transparently applied to existing applications. We evaluate a proofof-concept implementation using fault injection of hardware and software faults and show that it can recover, on average, 41-46% of all virtual machines, as well as having a mean virtual machine downtime of 3 secondsFCT Grant ECSEL/0018/2019 and FCT Ph.D. Grant SFRH/BD/130601/2017. European Social Fund, through the Regional Operational Program Centro 2020. Autonomic Service Operation (AESOP) Project under Grant P2020-31/SI/2017. AESOP Grant 040004. Electronic Components and Systems for European Leadership (ECSEL) Joint Undertaking (JU) under Grant 876852. JU from the European Union's Horizon 2020 Research and Innovation Programme and Austria, Czech Republic, Germany, Ireland, Italy, Portugal, Spain, Sweden, and Turkey.2021info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/100878https://hdl.handle.net/10316/100878https://doi.org/10.1109/ACCESS.2021.3098644eng2169-3536Cerveira, FredericoBarbosa, RaulMadeira, Henriqueinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-06T14:25:58Zoai:estudogeral.uc.pt:10316/100878Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T05:50:03.851646Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor
title Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor
spellingShingle Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor
Cerveira, Frederico
Cloud computing
dependability
fault injection
fault tolerance
virtualization
title_short Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor
title_full Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor
title_fullStr Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor
title_full_unstemmed Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor
title_sort Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor
author Cerveira, Frederico
author_facet Cerveira, Frederico
Barbosa, Raul
Madeira, Henrique
author_role author
author2 Barbosa, Raul
Madeira, Henrique
author2_role author
author
dc.contributor.author.fl_str_mv Cerveira, Frederico
Barbosa, Raul
Madeira, Henrique
dc.subject.por.fl_str_mv Cloud computing
dependability
fault injection
fault tolerance
virtualization
topic Cloud computing
dependability
fault injection
fault tolerance
virtualization
description Many organizations are moving their systems to the cloud, where providers consolidate multiple clients using virtualization, which creates challenges to business-critical applications. Research has shown that hypervisors fail, often causing common-mode failures that may abruptly disrupt dozens of virtual machines simultaneously. We hypothesize and empirically show that a significant percentage of virtual machines affected by a hypervisor failure are capable of continuing execution on a new hypervisor. Supported by this observation, we design a technique for recovering from hypervisor failures through efficient virtual machine migration to a co-located hypervisor, which allows virtual machines to continue executing with minimal downtime and which can be transparently applied to existing applications. We evaluate a proofof-concept implementation using fault injection of hardware and software faults and show that it can recover, on average, 41-46% of all virtual machines, as well as having a mean virtual machine downtime of 3 seconds
publishDate 2021
dc.date.none.fl_str_mv 2021
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10316/100878
https://hdl.handle.net/10316/100878
https://doi.org/10.1109/ACCESS.2021.3098644
url https://hdl.handle.net/10316/100878
https://doi.org/10.1109/ACCESS.2021.3098644
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2169-3536
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602489161613312