Challenging SQL-on-Hadoop performance with Apache Druid

Detalhes bibliográficos
Autor(a) principal: Correia, José
Data de Publicação: 2019
Outros Autores: Costa, Carlos A. P., Santos, Maribel Yasmina
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: http://hdl.handle.net/1822/66785
Resumo: In Big Data, SQL-on-Hadoop tools usually provide satisfactory performance for processing vast amounts of data, although new emerging tools may be an alternative. This paper evaluates if Apache Druid, an innovative column-oriented data store suited for online analytical processing workloads, is an alternative to some of the well-known SQL-on-Hadoop technologies and its potential in this role. In this evaluation, Druid, Hive and Presto are benchmarked with increasing data volumes. The results point Druid as a strong alternative, achieving better performance than Hive and Presto, and show the potential of integrating Hive and Druid, enhancing the potentialities of both tools.
id RCAP_8e1e4e0c963026a6ea2bf3bccfedb941
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/66785
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Challenging SQL-on-Hadoop performance with Apache DruidBig DataBig Data WarehouseSQL-on-HadoopDruidOLAPScience & TechnologyIn Big Data, SQL-on-Hadoop tools usually provide satisfactory performance for processing vast amounts of data, although new emerging tools may be an alternative. This paper evaluates if Apache Druid, an innovative column-oriented data store suited for online analytical processing workloads, is an alternative to some of the well-known SQL-on-Hadoop technologies and its potential in this role. In this evaluation, Druid, Hive and Presto are benchmarked with increasing data volumes. The results point Druid as a strong alternative, achieving better performance than Hive and Presto, and show the potential of integrating Hive and Druid, enhancing the potentialities of both tools.This work is supported by COMPETE: POCI-01-0145-FEDER-007043 and FCT - Fundacao para a Ciencia e Tecnologia within Project UID/CEC/00319/2013 and by European Structural and Investment Funds in the FEDER component, COMPETE 2020 (Funding Reference: POCI-01-0247-FEDER-002814).Springer VerlagUniversidade do MinhoCorreia, JoséCosta, Carlos A. P.Santos, Maribel Yasmina20192019-01-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/1822/66785eng97830302048461865-134810.1007/978-3-030-20485-3_12https://link.springer.com/chapter/10.1007%2F978-3-030-20485-3_12info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T05:02:55Zoai:repositorium.sdum.uminho.pt:1822/66785Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T15:06:27.612239Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Challenging SQL-on-Hadoop performance with Apache Druid
title Challenging SQL-on-Hadoop performance with Apache Druid
spellingShingle Challenging SQL-on-Hadoop performance with Apache Druid
Correia, José
Big Data
Big Data Warehouse
SQL-on-Hadoop
Druid
OLAP
Science & Technology
title_short Challenging SQL-on-Hadoop performance with Apache Druid
title_full Challenging SQL-on-Hadoop performance with Apache Druid
title_fullStr Challenging SQL-on-Hadoop performance with Apache Druid
title_full_unstemmed Challenging SQL-on-Hadoop performance with Apache Druid
title_sort Challenging SQL-on-Hadoop performance with Apache Druid
author Correia, José
author_facet Correia, José
Costa, Carlos A. P.
Santos, Maribel Yasmina
author_role author
author2 Costa, Carlos A. P.
Santos, Maribel Yasmina
author2_role author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Correia, José
Costa, Carlos A. P.
Santos, Maribel Yasmina
dc.subject.por.fl_str_mv Big Data
Big Data Warehouse
SQL-on-Hadoop
Druid
OLAP
Science & Technology
topic Big Data
Big Data Warehouse
SQL-on-Hadoop
Druid
OLAP
Science & Technology
description In Big Data, SQL-on-Hadoop tools usually provide satisfactory performance for processing vast amounts of data, although new emerging tools may be an alternative. This paper evaluates if Apache Druid, an innovative column-oriented data store suited for online analytical processing workloads, is an alternative to some of the well-known SQL-on-Hadoop technologies and its potential in this role. In this evaluation, Druid, Hive and Presto are benchmarked with increasing data volumes. The results point Druid as a strong alternative, achieving better performance than Hive and Presto, and show the potential of integrating Hive and Druid, enhancing the potentialities of both tools.
publishDate 2019
dc.date.none.fl_str_mv 2019
2019-01-01T00:00:00Z
dc.type.driver.fl_str_mv conference paper
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/66785
url http://hdl.handle.net/1822/66785
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 9783030204846
1865-1348
10.1007/978-3-030-20485-3_12
https://link.springer.com/chapter/10.1007%2F978-3-030-20485-3_12
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Springer Verlag
publisher.none.fl_str_mv Springer Verlag
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833595104363806720