Diagnóstico de desempenho e reconfiguração dinâmica em processamento de dados massivos

Detalhes bibliográficos
Ano de defesa: 2016
Autor(a) principal: Vinicius Vitor dos Santos Dias
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/ESBF-AKUNB8
Resumo: The increasing amount of data being stored and the variety of algorithms proposed to meet processing demands of the data scientists have led to a new generation of computational environments and paradigms. These environments facilitate the task of programming through high level abstractions; however, achieving the ideal performance continues to be a challenge. In this work we investigate important factors concerning the performance of common big-data applications and consider the Spark framework as the target for our contributions. In particular, we organize our methodology of analysis based on diagnosis dimensions, which allow the identification of uncommon scenarios that provide us with valuable information about the environments limitations and possible actions to mitigate the issues. First, we validate our observations by showing the potential that manual adjustments have for improving the applications performance.Finally, we apply the lessons learned from the previous findings through the design and implementation of a extensible tool that automates the reconfiguration of Spark applications. Our tool leverages logs from previous executions as input, enforces configurable adjustment policies over the collected statistics and makes its decisions taking into account communication behaviors specific of the application evaluated. In order to accomplish that, the tool identifies global parameters that should be updated or points in the user program where the data partitioning can be adjusted based on those policies. Our results show gains of up to 1.9 in the scenarios considered.