Hive on spark and MapReduce : a methodology for parameter tuning

Forster, Rodrigo Richard

Hive on spark and MapReduce : a methodology for parameter tuning

Bibliographic Details
Main Author:	Forster, Rodrigo Richard
Publication Date:	2018
Format:	Master thesis
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	http://hdl.handle.net/10362/52854
Summary:	Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies Management

Item metadata

id	RCAP_ca2dad9351d49acf1ad2b7b94630f2a4
oai_identifier_str	oai:run.unl.pt:10362/52854
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Hive on spark and MapReduce : a methodology for parameter tuningTuningHive on SparkMapReduceApache SparkBig DataHDFSHadoopData WarehouseProject Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementAs the era of “big data” has arrived, more and more companies start using distributed file systems to manage and process their data streams like the Hadoop distributed file system framework (HDFS). This software library offers a way to store large files across multiple machines. Large data sets are processed by using its inherent programming model MapReduce. Apache Spark is a relatively new alternative to Hadoop MapReduce and claims to offer a performance boost up to 10 times for certain applications, while maintaining its automatic fault tolerance. To leverage the Data Warehouse capabilities of Hadoop Apache Hive was introduced. It is a concept for Big Data analytics that works on top of Hadoop and provides data analysis tools and most importantly translates queries to MapReduce and Spark jobs. Therefore, it exploits the scalability of Hadoop and offers data exploration and mining capabilities to non-developers. However, it is difficult for users to utilize the full potential of the Apache Spark execution engine. This results in very long execution times. Therefore, this project work gives researches and companies a tuning methodology that significantly can improve the execution time of queries. As a result, this tuning methodology could optimize a real-world batch-processing query by 5 times. Moreover, it gives insides in the underlying reasons of this big improvement by using Apache Spark Monitoring tools. The result can be helpful for many practitioners and researchers that would like to optimise the performance of Spark and MapReduce queries executed in Hive on top of an Apache Hadoop cluster.Santos, Vitor Manuel Pereira Duarte dosRUNForster, Rodrigo Richard2018-11-26T14:59:01Z2018-10-292018-10-29T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/52854TID:202028755enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T17:35:44Zoai:run.unl.pt:10362/52854Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:06:53.036272Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Hive on spark and MapReduce : a methodology for parameter tuning
title	Hive on spark and MapReduce : a methodology for parameter tuning
spellingShingle	Hive on spark and MapReduce : a methodology for parameter tuning Forster, Rodrigo Richard Tuning Hive on Spark MapReduce Apache Spark Big Data HDFS Hadoop Data Warehouse
title_short	Hive on spark and MapReduce : a methodology for parameter tuning
title_full	Hive on spark and MapReduce : a methodology for parameter tuning
title_fullStr	Hive on spark and MapReduce : a methodology for parameter tuning
title_full_unstemmed	Hive on spark and MapReduce : a methodology for parameter tuning
title_sort	Hive on spark and MapReduce : a methodology for parameter tuning
author	Forster, Rodrigo Richard
author_facet	Forster, Rodrigo Richard
author_role	author
dc.contributor.none.fl_str_mv	Santos, Vitor Manuel Pereira Duarte dos RUN
dc.contributor.author.fl_str_mv	Forster, Rodrigo Richard
dc.subject.por.fl_str_mv	Tuning Hive on Spark MapReduce Apache Spark Big Data HDFS Hadoop Data Warehouse
topic	Tuning Hive on Spark MapReduce Apache Spark Big Data HDFS Hadoop Data Warehouse
description	Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies Management
publishDate	2018
dc.date.none.fl_str_mv	2018-11-26T14:59:01Z 2018-10-29 2018-10-29T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10362/52854 TID:202028755
url	http://hdl.handle.net/10362/52854
identifier_str_mv	TID:202028755
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833596443640725504

Hive on spark and MapReduce : a methodology for parameter tuning

Similar Items