Arquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídos

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Fazul, Rhauani Weber Aita
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Santa Maria
Brasil
Ciência da Computação
UFSM
Programa de Pós-Graduação em Ciência da Computação
Centro de Tecnologia
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.ufsm.br/handle/1/26470
Resumo: Distributed file systems are essential to support applications that handle large volumes of data. One of the most widely used file systems is the HDFS, Apache Hadoop’s Distributed File System. Data replication, which is at the core of the HDFS storage model, is essential for fault tolerance and performance since the placement of the data across the cluster directly affects replica balancing and data locality. As new data is loaded into the system, it is common for the distribution of the replicas among the nodes to become unbalanced. HDFS Balancer is the official solution for data balancing by rearranging the replicas already stored in the cluster. Nevertheless, its current balancing policy does not address the characteristics and specific needs of the applications during data rearrangement. Besides that, it is up to the system administrator to monitor the HDFS status and, when considered necessary, run the balancer daemon, which creates a dependency that is inadequate and inefficient in many situations. To address these limitations, this work presents DARB, a dynamic architecture that promotes reactive and proactive replica balancing. The reactive strategy arises from the PRBP, a customized and prioritized replica balancing policy for the HDFSBalancer. The PRBP is based on an adaptable and configurable system of priorities, from which association rules were defined to allow the use of multiple priorities simultaneously. Along with the rules, a set of usage guidelines has been formalized and evaluated through practical experiments, which validated the behavior and applicability of the PRBP. The proactive strategy of DARB, in contrast, consists of an event-driven strategy to make the replica balancing process in HDFS transparent. To this end, a metrics observation model and a structure were created to automatically determine when corrective actions should be taken and trigger the balancing process in the file system based on standardized trigger events. The evaluation results reinforce that the proposed solution removes the need for manual configuration and utilization of the HDFS Balancer while actively acting to keep the cluster balanced taking into account performance, reliability, and data availability perspectives. In this way, DARB presents itself as a specialized solution, flexibilizing the balancing process and introducing to HDFS the concept of context-aware replica balancing.