Desenvolvimento e avaliação de desempenho de um cluster Raspberry Pi e Apache Hadoop em aplicações big data

Alves Neto, Antônio José

Desenvolvimento e avaliação de desempenho de um cluster Raspberry Pi e Apache Hadoop em aplicações big data

Detalhes bibliográficos
Ano de defesa:	2023
Autor(a) principal:	Alves Neto, Antônio José
Orientador(a):	Ordonez, Edward David Moreno
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Não Informado pela instituição
Programa de Pós-Graduação:	Pós-Graduação em Ciência da Computação
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Plataforma aberta da web Benchmarking (Administração) Raspberry Pi Zabbix Apache Hadoop Grafana
Palavras-chave em Inglês:	Big data Cluster Benchmarks
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	https://ri.ufs.br/jspui/handle/riufs/18318
Resumo:	Currently, with the exponential advancement of technology, a large amount of data is generated daily. These data aren’t generated just by people. A range of electronic equipment has also become great generators. These large volumes of data are known as Big Data and produce valuable and helpful information for business intelligence, forecasting, and decision support, among other possibilities. However, processing this large volume of data requires a different computational approach from the traditional one, called High Performance Computing (HPC). Over the years, the HPC has been using supercomputers or computing clusters. The first one is no longer an option due to its high cost and difficulty to maintain, making clustering an ideal alternative. Clusters are loosely coupled systems, formed by a set of computers that work in collaboration with each other, using message exchange libraries. In addition, clusters formed by Single Board Computers (SBC) are a viable alternative for the development of research in this area. Among the SBCs, the Raspberry Pi stands out, a SBC initially developed to promote the teaching of computer science. Its variety of models allows it to meet several specific requirements and does not require large investments. To operate and to process this large volume of data in a cluster, it is necessary to have a big data platform, the Apache Hadoop being one of the most widely available today. Thus, a good solution to obtain a low-cost big data cluster is to combine the use the Raspberry Pi as the hardware structure and Apache Hadoop as Big Data platform. However, the lack of detailed material explaining all the installation steps, the configuration process, and, finally, the certification that the Hadoop cluster is working correctly is a problem little explored by the academic community. In addition, the monitoring of cluster resources is also a problem that is rarely addressed by the academy. In order to solve this problem, this work aims to develop and evaluate the performance of a low-cost big data cluster using Raspberry Pi as a low-cost hardware structure and Apache Hadoop as a Big Data platform. Its evaluation will be done using benchmarks widespread in the area (Terasort and TestDFSIO), in addition to accompanying and monitoring the use of its resources using the tools Zabbix and Grafana, providing a complete and detailed material of this entire process.

Desenvolvimento e avaliação de desempenho de um cluster Raspberry Pi e Apache Hadoop em aplicações big data

Registros relacionados