Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop

Mendes, Eduardo Fernando

Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop

Detalhes bibliográficos
Ano de defesa:	2017
Autor(a) principal:	Mendes, Eduardo Fernando
Orientador(a):	Ciferri, Ricardo Rodrigues
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de São Carlos Câmpus São Carlos
Programa de Pós-Graduação:	Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Banco de dados espaciais Processamento de consulta Junção espacial Processamento paralelo e distribuído Clusters de computadores
Palavras-chave em Inglês:	Spatial databases Query processing Spatial join Parallel and distributed processing Cluster computing
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	https://repositorio.ufscar.br/handle/20.500.14289/9168
Resumo:	The huge volume of spatial data generated and made available in recent years from different sources, such as remote sensing, smart phones, space telescopes, and satellites, has motivated researchers and practitioners around the world to find out a way to process efficiently this huge volume of spatial data. Systems based on the MapReduce programming paradigm, such as Hadoop, have proven to be an efficient framework for processing huge volumes of data in many applications. However, Hadoop has showed not to be adequate in native support for spatial data due to its central structure is not aware of the spatial characteristics of such data. The solution to this problem gave rise to SpatialHadoop, which is a Hadoop extension with native support for spatial data. However, SpatialHadoop does not enable to jointly allocate related spatial data and also does not take into account any characteristics of the data in the process of task scheduler for processing on the nodes of a cluster of computers. Given this scenario, this PhD dissertation aims to propose new strategies to improve the performance of the processing of the spatial join operations for huge volumes of data using SpatialHadoop. For this purpose, the proposed solutions explore the joint allocation of related spatial data and the scheduling strategy of MapReduce for related spatial data also allocated in a jointly form. The efficient data access is an essential step in achieving better performance during query processing. Therefore, the proposed solutions allow the reduction of network traffic and I/O operations to the disk and consequently improve the performance of spatial join processing by using SpatialHadoop. By means of experimental evaluations, it was possible to show that the novel data allocation policies and scheduling tasks actually improve the total processing time of the spatial join operations. The performance gain varied from 14.7% to 23.6% if compared to the baseline proposed by CoS-HDFS and varied from 8.3% to 65% if compared to the native support of SpatialHadoop.

Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em Spatialhadoop

Registros relacionados