Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor
Main Author: | |
---|---|
Publication Date: | 2021 |
Other Authors: | , |
Format: | Article |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | https://hdl.handle.net/10316/100878 https://doi.org/10.1109/ACCESS.2021.3098644 |
Summary: | Many organizations are moving their systems to the cloud, where providers consolidate multiple clients using virtualization, which creates challenges to business-critical applications. Research has shown that hypervisors fail, often causing common-mode failures that may abruptly disrupt dozens of virtual machines simultaneously. We hypothesize and empirically show that a significant percentage of virtual machines affected by a hypervisor failure are capable of continuing execution on a new hypervisor. Supported by this observation, we design a technique for recovering from hypervisor failures through efficient virtual machine migration to a co-located hypervisor, which allows virtual machines to continue executing with minimal downtime and which can be transparently applied to existing applications. We evaluate a proofof-concept implementation using fault injection of hardware and software faults and show that it can recover, on average, 41-46% of all virtual machines, as well as having a mean virtual machine downtime of 3 seconds |
id |
RCAP_12cd038dc228ec7f279a6aed8c08c056 |
---|---|
oai_identifier_str |
oai:estudogeral.uc.pt:10316/100878 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Mitigating Virtualization Failures Through Migration to a Co-Located HypervisorCloud computingdependabilityfault injectionfault tolerancevirtualizationMany organizations are moving their systems to the cloud, where providers consolidate multiple clients using virtualization, which creates challenges to business-critical applications. Research has shown that hypervisors fail, often causing common-mode failures that may abruptly disrupt dozens of virtual machines simultaneously. We hypothesize and empirically show that a significant percentage of virtual machines affected by a hypervisor failure are capable of continuing execution on a new hypervisor. Supported by this observation, we design a technique for recovering from hypervisor failures through efficient virtual machine migration to a co-located hypervisor, which allows virtual machines to continue executing with minimal downtime and which can be transparently applied to existing applications. We evaluate a proofof-concept implementation using fault injection of hardware and software faults and show that it can recover, on average, 41-46% of all virtual machines, as well as having a mean virtual machine downtime of 3 secondsFCT Grant ECSEL/0018/2019 and FCT Ph.D. Grant SFRH/BD/130601/2017. European Social Fund, through the Regional Operational Program Centro 2020. Autonomic Service Operation (AESOP) Project under Grant P2020-31/SI/2017. AESOP Grant 040004. Electronic Components and Systems for European Leadership (ECSEL) Joint Undertaking (JU) under Grant 876852. JU from the European Union's Horizon 2020 Research and Innovation Programme and Austria, Czech Republic, Germany, Ireland, Italy, Portugal, Spain, Sweden, and Turkey.2021info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/100878https://hdl.handle.net/10316/100878https://doi.org/10.1109/ACCESS.2021.3098644eng2169-3536Cerveira, FredericoBarbosa, RaulMadeira, Henriqueinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-06T14:25:58Zoai:estudogeral.uc.pt:10316/100878Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T05:50:03.851646Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor |
title |
Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor |
spellingShingle |
Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor Cerveira, Frederico Cloud computing dependability fault injection fault tolerance virtualization |
title_short |
Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor |
title_full |
Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor |
title_fullStr |
Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor |
title_full_unstemmed |
Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor |
title_sort |
Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor |
author |
Cerveira, Frederico |
author_facet |
Cerveira, Frederico Barbosa, Raul Madeira, Henrique |
author_role |
author |
author2 |
Barbosa, Raul Madeira, Henrique |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Cerveira, Frederico Barbosa, Raul Madeira, Henrique |
dc.subject.por.fl_str_mv |
Cloud computing dependability fault injection fault tolerance virtualization |
topic |
Cloud computing dependability fault injection fault tolerance virtualization |
description |
Many organizations are moving their systems to the cloud, where providers consolidate multiple clients using virtualization, which creates challenges to business-critical applications. Research has shown that hypervisors fail, often causing common-mode failures that may abruptly disrupt dozens of virtual machines simultaneously. We hypothesize and empirically show that a significant percentage of virtual machines affected by a hypervisor failure are capable of continuing execution on a new hypervisor. Supported by this observation, we design a technique for recovering from hypervisor failures through efficient virtual machine migration to a co-located hypervisor, which allows virtual machines to continue executing with minimal downtime and which can be transparently applied to existing applications. We evaluate a proofof-concept implementation using fault injection of hardware and software faults and show that it can recover, on average, 41-46% of all virtual machines, as well as having a mean virtual machine downtime of 3 seconds |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10316/100878 https://hdl.handle.net/10316/100878 https://doi.org/10.1109/ACCESS.2021.3098644 |
url |
https://hdl.handle.net/10316/100878 https://doi.org/10.1109/ACCESS.2021.3098644 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2169-3536 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833602489161613312 |