Scalable Consistency for Data Replication
Main Author: | |
---|---|
Publication Date: | 2024 |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10362/179323 |
Summary: | Distributed data storage solutions are key components of large-scale Internet services. The consistency guarantees provided by the protocols used to replicate data in such solu- tions can vary greatly, with some protocols providing strong guarantees while sacrificing availability, and others providing weaker guarantees while allowing higher availability. The choice of the replication protocol to use is often tied to the physical distribution of data replicas. As a general rule for Internet services, data replicated within a single datacen- ter is often replicated using protocols providing stronger consistency guarantees, while data replicated across multiple datacenters (i.e., geo-replicated) is often replicated using protocols providing weaker consistency guarantees. However, designing performant and fault-tolerant data replication solutions with data consistency that can scale in number of replicas is a challenging task. This is true not only for strong consistency models, but also for weaker consistency models, such as causal (and causal+) consistency. In this thesis, we propose to address the scalability and fault-tolerance limitations of providing consistency guarantees in data replication solutions, by addressing the entire spectrum of replication deployment scenarios, from single datacenter deployments, to geo-replicated, and edge computing deployments. To this end, we propose three main contributions, one for each of these deployment scenarios. First, we propose a new state-machine replication protocol, based on a new variant of Paxos, which improves on existing solutions by maximizing the throughput while avoiding performance degrada- tion when increasing the number of replicas. Second, we leverage the properties of the proposed protocol to replicate key components in geo-replicated causal consistency solu- tions, overcoming their fault-tolerance limitations while maximizing their performance. Third, by addressing the challenges of the edge computing environment, we propose a causal consistency data management solution that can efficiently scale to hundreds of edge locations. |
id |
RCAP_65546f3ce3385f3aafe82d81b4ca97c3 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/179323 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Scalable Consistency for Data ReplicationData replicationCausal consistencyState machine replicationEdge computingGeo-replicationDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaDistributed data storage solutions are key components of large-scale Internet services. The consistency guarantees provided by the protocols used to replicate data in such solu- tions can vary greatly, with some protocols providing strong guarantees while sacrificing availability, and others providing weaker guarantees while allowing higher availability. The choice of the replication protocol to use is often tied to the physical distribution of data replicas. As a general rule for Internet services, data replicated within a single datacen- ter is often replicated using protocols providing stronger consistency guarantees, while data replicated across multiple datacenters (i.e., geo-replicated) is often replicated using protocols providing weaker consistency guarantees. However, designing performant and fault-tolerant data replication solutions with data consistency that can scale in number of replicas is a challenging task. This is true not only for strong consistency models, but also for weaker consistency models, such as causal (and causal+) consistency. In this thesis, we propose to address the scalability and fault-tolerance limitations of providing consistency guarantees in data replication solutions, by addressing the entire spectrum of replication deployment scenarios, from single datacenter deployments, to geo-replicated, and edge computing deployments. To this end, we propose three main contributions, one for each of these deployment scenarios. First, we propose a new state-machine replication protocol, based on a new variant of Paxos, which improves on existing solutions by maximizing the throughput while avoiding performance degrada- tion when increasing the number of replicas. Second, we leverage the properties of the proposed protocol to replicate key components in geo-replicated causal consistency solu- tions, overcoming their fault-tolerance limitations while maximizing their performance. Third, by addressing the challenges of the edge computing environment, we propose a causal consistency data management solution that can efficiently scale to hundreds of edge locations.As soluções de armazenamento de dados distribuídas são um componente crucial para o funcionamento dos serviços de larga escala na Internet. As garantias de coêrencia fornecidas pelos protocolos utilizados para replicar dados nestas soluções podem variar consideravelmente. Enquanto alguns protolocos sacrificam disponibilidade para fornecer garantias de coêrencia fortes, outros fornecem garantias mais fracas, garantindo maior disponibilidade. Tipicamente, a escolha de protocolo de replicação a utilizar está associada à distribuição fisica das réplicas dos dados. Como regra geral para serviços na Internet, a replicação de dados dentro de um único centro de dados frequentemente utiliza protocolos que fornecem garantias de coêrencia mais fortes, enquanto a replicação de dados entre múltiplos centros de dados (i.e., geo-replicação) utiliza protocolos que fornecem garantias de coêrencia mais fracas. No entanto, o desenho de soluções de replicação de dados com alto desempenho e tolerantes a falhas, com garantias de coêrencia que possam escalar em número de réplicas, é uma tarefa desafiante. Isto não só para modelos de coêrencia fortes, mas também para modelos de coêrencia mais fracos, como a coêrencia causal (e causal+). Nesta tese, propomos abordar as limitações de escalabilidade e tolerância a falhas de soluções de replicação de dados com garantias de coêrencia, abordando todo o espectro de cenários de distribuição de réplicas, desde um único centro de dados, passando por replicação geo-distribuida e terminando em cenários de edge computing. Para tal, propomos três contribuições principais, uma para cada um destes cenários. Em primeiro lugar, propomos um novo protocolo de replicação de máquina de estados, baseado numa nova variente de Paxos, que supera as soluções existentes maximizando o número de operações por segundo sem degradar o desempenho ao aumentar o número de réplicas. Em segundo lugar, utilizamos as propriedades do protocolo proposto para replicar componentes chave em soluções de coêrencia causal geo-replicadas, superando as suas limitações de tolerância a falhas, simultaneamente maximizando o seu desempenho. Em terceiro lugar, ao abordar os desafios do ambiente de edge computing, propomos uma solução de gestão de dados com coêrencia causal com capacidade para escalar eficientemente para centenas de localizações de edge.Leitão, JoãoPreguiça, NunoRUNFouto, Pedro Filipe Veiga2025-02-19T13:44:49Z20242024-01-01T00:00:00Zdoctoral thesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10362/179323enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-24T01:49:53Zoai:run.unl.pt:10362/179323Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T20:40:14.335023Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Scalable Consistency for Data Replication |
title |
Scalable Consistency for Data Replication |
spellingShingle |
Scalable Consistency for Data Replication Fouto, Pedro Filipe Veiga Data replication Causal consistency State machine replication Edge computing Geo-replication Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
title_short |
Scalable Consistency for Data Replication |
title_full |
Scalable Consistency for Data Replication |
title_fullStr |
Scalable Consistency for Data Replication |
title_full_unstemmed |
Scalable Consistency for Data Replication |
title_sort |
Scalable Consistency for Data Replication |
author |
Fouto, Pedro Filipe Veiga |
author_facet |
Fouto, Pedro Filipe Veiga |
author_role |
author |
dc.contributor.none.fl_str_mv |
Leitão, João Preguiça, Nuno RUN |
dc.contributor.author.fl_str_mv |
Fouto, Pedro Filipe Veiga |
dc.subject.por.fl_str_mv |
Data replication Causal consistency State machine replication Edge computing Geo-replication Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
topic |
Data replication Causal consistency State machine replication Edge computing Geo-replication Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
description |
Distributed data storage solutions are key components of large-scale Internet services. The consistency guarantees provided by the protocols used to replicate data in such solu- tions can vary greatly, with some protocols providing strong guarantees while sacrificing availability, and others providing weaker guarantees while allowing higher availability. The choice of the replication protocol to use is often tied to the physical distribution of data replicas. As a general rule for Internet services, data replicated within a single datacen- ter is often replicated using protocols providing stronger consistency guarantees, while data replicated across multiple datacenters (i.e., geo-replicated) is often replicated using protocols providing weaker consistency guarantees. However, designing performant and fault-tolerant data replication solutions with data consistency that can scale in number of replicas is a challenging task. This is true not only for strong consistency models, but also for weaker consistency models, such as causal (and causal+) consistency. In this thesis, we propose to address the scalability and fault-tolerance limitations of providing consistency guarantees in data replication solutions, by addressing the entire spectrum of replication deployment scenarios, from single datacenter deployments, to geo-replicated, and edge computing deployments. To this end, we propose three main contributions, one for each of these deployment scenarios. First, we propose a new state-machine replication protocol, based on a new variant of Paxos, which improves on existing solutions by maximizing the throughput while avoiding performance degrada- tion when increasing the number of replicas. Second, we leverage the properties of the proposed protocol to replicate key components in geo-replicated causal consistency solu- tions, overcoming their fault-tolerance limitations while maximizing their performance. Third, by addressing the challenges of the edge computing environment, we propose a causal consistency data management solution that can efficiently scale to hundreds of edge locations. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024 2024-01-01T00:00:00Z 2025-02-19T13:44:49Z |
dc.type.driver.fl_str_mv |
doctoral thesis |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/179323 |
url |
http://hdl.handle.net/10362/179323 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833598785279754240 |