Scalable Consistency for Data Replication

Bibliographic Details
Main Author: Fouto, Pedro Filipe Veiga
Publication Date: 2024
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10362/179323
Summary: Distributed data storage solutions are key components of large-scale Internet services. The consistency guarantees provided by the protocols used to replicate data in such solu- tions can vary greatly, with some protocols providing strong guarantees while sacrificing availability, and others providing weaker guarantees while allowing higher availability. The choice of the replication protocol to use is often tied to the physical distribution of data replicas. As a general rule for Internet services, data replicated within a single datacen- ter is often replicated using protocols providing stronger consistency guarantees, while data replicated across multiple datacenters (i.e., geo-replicated) is often replicated using protocols providing weaker consistency guarantees. However, designing performant and fault-tolerant data replication solutions with data consistency that can scale in number of replicas is a challenging task. This is true not only for strong consistency models, but also for weaker consistency models, such as causal (and causal+) consistency. In this thesis, we propose to address the scalability and fault-tolerance limitations of providing consistency guarantees in data replication solutions, by addressing the entire spectrum of replication deployment scenarios, from single datacenter deployments, to geo-replicated, and edge computing deployments. To this end, we propose three main contributions, one for each of these deployment scenarios. First, we propose a new state-machine replication protocol, based on a new variant of Paxos, which improves on existing solutions by maximizing the throughput while avoiding performance degrada- tion when increasing the number of replicas. Second, we leverage the properties of the proposed protocol to replicate key components in geo-replicated causal consistency solu- tions, overcoming their fault-tolerance limitations while maximizing their performance. Third, by addressing the challenges of the edge computing environment, we propose a causal consistency data management solution that can efficiently scale to hundreds of edge locations.
id RCAP_65546f3ce3385f3aafe82d81b4ca97c3
oai_identifier_str oai:run.unl.pt:10362/179323
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Scalable Consistency for Data ReplicationData replicationCausal consistencyState machine replicationEdge computingGeo-replicationDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaDistributed data storage solutions are key components of large-scale Internet services. The consistency guarantees provided by the protocols used to replicate data in such solu- tions can vary greatly, with some protocols providing strong guarantees while sacrificing availability, and others providing weaker guarantees while allowing higher availability. The choice of the replication protocol to use is often tied to the physical distribution of data replicas. As a general rule for Internet services, data replicated within a single datacen- ter is often replicated using protocols providing stronger consistency guarantees, while data replicated across multiple datacenters (i.e., geo-replicated) is often replicated using protocols providing weaker consistency guarantees. However, designing performant and fault-tolerant data replication solutions with data consistency that can scale in number of replicas is a challenging task. This is true not only for strong consistency models, but also for weaker consistency models, such as causal (and causal+) consistency. In this thesis, we propose to address the scalability and fault-tolerance limitations of providing consistency guarantees in data replication solutions, by addressing the entire spectrum of replication deployment scenarios, from single datacenter deployments, to geo-replicated, and edge computing deployments. To this end, we propose three main contributions, one for each of these deployment scenarios. First, we propose a new state-machine replication protocol, based on a new variant of Paxos, which improves on existing solutions by maximizing the throughput while avoiding performance degrada- tion when increasing the number of replicas. Second, we leverage the properties of the proposed protocol to replicate key components in geo-replicated causal consistency solu- tions, overcoming their fault-tolerance limitations while maximizing their performance. Third, by addressing the challenges of the edge computing environment, we propose a causal consistency data management solution that can efficiently scale to hundreds of edge locations.As soluções de armazenamento de dados distribuídas são um componente crucial para o funcionamento dos serviços de larga escala na Internet. As garantias de coêrencia fornecidas pelos protocolos utilizados para replicar dados nestas soluções podem variar consideravelmente. Enquanto alguns protolocos sacrificam disponibilidade para fornecer garantias de coêrencia fortes, outros fornecem garantias mais fracas, garantindo maior disponibilidade. Tipicamente, a escolha de protocolo de replicação a utilizar está associada à distribuição fisica das réplicas dos dados. Como regra geral para serviços na Internet, a replicação de dados dentro de um único centro de dados frequentemente utiliza protocolos que fornecem garantias de coêrencia mais fortes, enquanto a replicação de dados entre múltiplos centros de dados (i.e., geo-replicação) utiliza protocolos que fornecem garantias de coêrencia mais fracas. No entanto, o desenho de soluções de replicação de dados com alto desempenho e tolerantes a falhas, com garantias de coêrencia que possam escalar em número de réplicas, é uma tarefa desafiante. Isto não só para modelos de coêrencia fortes, mas também para modelos de coêrencia mais fracos, como a coêrencia causal (e causal+). Nesta tese, propomos abordar as limitações de escalabilidade e tolerância a falhas de soluções de replicação de dados com garantias de coêrencia, abordando todo o espectro de cenários de distribuição de réplicas, desde um único centro de dados, passando por replicação geo-distribuida e terminando em cenários de edge computing. Para tal, propomos três contribuições principais, uma para cada um destes cenários. Em primeiro lugar, propomos um novo protocolo de replicação de máquina de estados, baseado numa nova variente de Paxos, que supera as soluções existentes maximizando o número de operações por segundo sem degradar o desempenho ao aumentar o número de réplicas. Em segundo lugar, utilizamos as propriedades do protocolo proposto para replicar componentes chave em soluções de coêrencia causal geo-replicadas, superando as suas limitações de tolerância a falhas, simultaneamente maximizando o seu desempenho. Em terceiro lugar, ao abordar os desafios do ambiente de edge computing, propomos uma solução de gestão de dados com coêrencia causal com capacidade para escalar eficientemente para centenas de localizações de edge.Leitão, JoãoPreguiça, NunoRUNFouto, Pedro Filipe Veiga2025-02-19T13:44:49Z20242024-01-01T00:00:00Zdoctoral thesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10362/179323enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-24T01:49:53Zoai:run.unl.pt:10362/179323Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T20:40:14.335023Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Scalable Consistency for Data Replication
title Scalable Consistency for Data Replication
spellingShingle Scalable Consistency for Data Replication
Fouto, Pedro Filipe Veiga
Data replication
Causal consistency
State machine replication
Edge computing
Geo-replication
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Scalable Consistency for Data Replication
title_full Scalable Consistency for Data Replication
title_fullStr Scalable Consistency for Data Replication
title_full_unstemmed Scalable Consistency for Data Replication
title_sort Scalable Consistency for Data Replication
author Fouto, Pedro Filipe Veiga
author_facet Fouto, Pedro Filipe Veiga
author_role author
dc.contributor.none.fl_str_mv Leitão, João
Preguiça, Nuno
RUN
dc.contributor.author.fl_str_mv Fouto, Pedro Filipe Veiga
dc.subject.por.fl_str_mv Data replication
Causal consistency
State machine replication
Edge computing
Geo-replication
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Data replication
Causal consistency
State machine replication
Edge computing
Geo-replication
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Distributed data storage solutions are key components of large-scale Internet services. The consistency guarantees provided by the protocols used to replicate data in such solu- tions can vary greatly, with some protocols providing strong guarantees while sacrificing availability, and others providing weaker guarantees while allowing higher availability. The choice of the replication protocol to use is often tied to the physical distribution of data replicas. As a general rule for Internet services, data replicated within a single datacen- ter is often replicated using protocols providing stronger consistency guarantees, while data replicated across multiple datacenters (i.e., geo-replicated) is often replicated using protocols providing weaker consistency guarantees. However, designing performant and fault-tolerant data replication solutions with data consistency that can scale in number of replicas is a challenging task. This is true not only for strong consistency models, but also for weaker consistency models, such as causal (and causal+) consistency. In this thesis, we propose to address the scalability and fault-tolerance limitations of providing consistency guarantees in data replication solutions, by addressing the entire spectrum of replication deployment scenarios, from single datacenter deployments, to geo-replicated, and edge computing deployments. To this end, we propose three main contributions, one for each of these deployment scenarios. First, we propose a new state-machine replication protocol, based on a new variant of Paxos, which improves on existing solutions by maximizing the throughput while avoiding performance degrada- tion when increasing the number of replicas. Second, we leverage the properties of the proposed protocol to replicate key components in geo-replicated causal consistency solu- tions, overcoming their fault-tolerance limitations while maximizing their performance. Third, by addressing the challenges of the edge computing environment, we propose a causal consistency data management solution that can efficiently scale to hundreds of edge locations.
publishDate 2024
dc.date.none.fl_str_mv 2024
2024-01-01T00:00:00Z
2025-02-19T13:44:49Z
dc.type.driver.fl_str_mv doctoral thesis
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/179323
url http://hdl.handle.net/10362/179323
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833598785279754240