Dependability mechanisms for desktop grids

Bibliographic Details
Main Author: Domingues, Patrício Rodrigues
Publication Date: 2008
Language: por
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10400.8/118
Summary: It is a well-known fact that most of the computing power spread over the Internet simply goes unused, with CPU and other resources sitting idle most of the time: on average less than 5% of the CPU time is effectively used. Desktop grids are software infrastructures that aim to exploit the otherwise idle processing power, making it available to users that require computational resources to solve longrunning applications. The outcome of some efforts to harness idle machines can be seen in public projects such as SETI@home and Folding@home that boost impressive performance figures, in the order of several hundreds of TFLOPS each. At the same time, many institutions, both academic and corporate, run their own desktop grid platforms. However, while desktop grids provide free computing power, they need to face important issues like fault tolerance and security, two of the main problems that harden the widespread use of desktop grid computing. In this thesis, we aim to exploit a set of fault tolerance techniques, such as checkpointing and redundant executions, to promote faster turnaround times. We start with an experimental study, where we analyze the availability of the computing resources of an academic institution. We then focus on the benefits of sharing checkpoints in both institutional and wide-scale environments. We also explore hybrid schemes, where the traditional centralized desktop grid organization is complemented with peer-to-peer resources. Another major issue regarding desktop grids is related with the level of trust that can be achieved relatively to the volunteered hosts that carry out the executions. We propose and explore several mechanisms aimed at reducing the waste of computational resources needed to detect incorrect computations. For this purpose, we detail a checkpoint-based scheme for early detection of errors. We also propose and analyze an invitation-based strategy coupled to a credit rewarding scheme, to allow the enrollment and filtering of more trustworthy and more motivated resource donors. To summarize, we propose and study several fault tolerance methodologies oriented toward a more efficient usage of resources, resorting to techniques such as checkpointing, replication and sabotage tolerance to fasten and to make more reliable executions that are carried over desktop grid resources. The usage of techniques like these ones will be of ultimate importance for the wider deployment of applications over desktop grids.
id RCAP_6fd4382069b31b30bfa311d57503946f
oai_identifier_str oai:iconline.ipleiria.pt:10400.8/118
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Dependability mechanisms for desktop gridsFault toleranceDesktop gridsVolunteer computingCheckpointingSchedulingSabotage-toleranceIt is a well-known fact that most of the computing power spread over the Internet simply goes unused, with CPU and other resources sitting idle most of the time: on average less than 5% of the CPU time is effectively used. Desktop grids are software infrastructures that aim to exploit the otherwise idle processing power, making it available to users that require computational resources to solve longrunning applications. The outcome of some efforts to harness idle machines can be seen in public projects such as SETI@home and Folding@home that boost impressive performance figures, in the order of several hundreds of TFLOPS each. At the same time, many institutions, both academic and corporate, run their own desktop grid platforms. However, while desktop grids provide free computing power, they need to face important issues like fault tolerance and security, two of the main problems that harden the widespread use of desktop grid computing. In this thesis, we aim to exploit a set of fault tolerance techniques, such as checkpointing and redundant executions, to promote faster turnaround times. We start with an experimental study, where we analyze the availability of the computing resources of an academic institution. We then focus on the benefits of sharing checkpoints in both institutional and wide-scale environments. We also explore hybrid schemes, where the traditional centralized desktop grid organization is complemented with peer-to-peer resources. Another major issue regarding desktop grids is related with the level of trust that can be achieved relatively to the volunteered hosts that carry out the executions. We propose and explore several mechanisms aimed at reducing the waste of computational resources needed to detect incorrect computations. For this purpose, we detail a checkpoint-based scheme for early detection of errors. We also propose and analyze an invitation-based strategy coupled to a credit rewarding scheme, to allow the enrollment and filtering of more trustworthy and more motivated resource donors. To summarize, we propose and study several fault tolerance methodologies oriented toward a more efficient usage of resources, resorting to techniques such as checkpointing, replication and sabotage tolerance to fasten and to make more reliable executions that are carried over desktop grid resources. The usage of techniques like these ones will be of ultimate importance for the wider deployment of applications over desktop grids.Repositório IC-OnlineDomingues, Patrício Rodrigues2009-08-14T14:33:09Z20082008-01-01T00:00:00Zdoctoral thesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10400.8/118urn:tid:101184638porinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-25T15:11:36Zoai:iconline.ipleiria.pt:10400.8/118Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T20:50:42.461157Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Dependability mechanisms for desktop grids
title Dependability mechanisms for desktop grids
spellingShingle Dependability mechanisms for desktop grids
Domingues, Patrício Rodrigues
Fault tolerance
Desktop grids
Volunteer computing
Checkpointing
Scheduling
Sabotage-tolerance
title_short Dependability mechanisms for desktop grids
title_full Dependability mechanisms for desktop grids
title_fullStr Dependability mechanisms for desktop grids
title_full_unstemmed Dependability mechanisms for desktop grids
title_sort Dependability mechanisms for desktop grids
author Domingues, Patrício Rodrigues
author_facet Domingues, Patrício Rodrigues
author_role author
dc.contributor.none.fl_str_mv Repositório IC-Online
dc.contributor.author.fl_str_mv Domingues, Patrício Rodrigues
dc.subject.por.fl_str_mv Fault tolerance
Desktop grids
Volunteer computing
Checkpointing
Scheduling
Sabotage-tolerance
topic Fault tolerance
Desktop grids
Volunteer computing
Checkpointing
Scheduling
Sabotage-tolerance
description It is a well-known fact that most of the computing power spread over the Internet simply goes unused, with CPU and other resources sitting idle most of the time: on average less than 5% of the CPU time is effectively used. Desktop grids are software infrastructures that aim to exploit the otherwise idle processing power, making it available to users that require computational resources to solve longrunning applications. The outcome of some efforts to harness idle machines can be seen in public projects such as SETI@home and Folding@home that boost impressive performance figures, in the order of several hundreds of TFLOPS each. At the same time, many institutions, both academic and corporate, run their own desktop grid platforms. However, while desktop grids provide free computing power, they need to face important issues like fault tolerance and security, two of the main problems that harden the widespread use of desktop grid computing. In this thesis, we aim to exploit a set of fault tolerance techniques, such as checkpointing and redundant executions, to promote faster turnaround times. We start with an experimental study, where we analyze the availability of the computing resources of an academic institution. We then focus on the benefits of sharing checkpoints in both institutional and wide-scale environments. We also explore hybrid schemes, where the traditional centralized desktop grid organization is complemented with peer-to-peer resources. Another major issue regarding desktop grids is related with the level of trust that can be achieved relatively to the volunteered hosts that carry out the executions. We propose and explore several mechanisms aimed at reducing the waste of computational resources needed to detect incorrect computations. For this purpose, we detail a checkpoint-based scheme for early detection of errors. We also propose and analyze an invitation-based strategy coupled to a credit rewarding scheme, to allow the enrollment and filtering of more trustworthy and more motivated resource donors. To summarize, we propose and study several fault tolerance methodologies oriented toward a more efficient usage of resources, resorting to techniques such as checkpointing, replication and sabotage tolerance to fasten and to make more reliable executions that are carried over desktop grid resources. The usage of techniques like these ones will be of ultimate importance for the wider deployment of applications over desktop grids.
publishDate 2008
dc.date.none.fl_str_mv 2008
2008-01-01T00:00:00Z
2009-08-14T14:33:09Z
dc.type.driver.fl_str_mv doctoral thesis
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.8/118
urn:tid:101184638
url http://hdl.handle.net/10400.8/118
identifier_str_mv urn:tid:101184638
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833598904847826944