Previsão do tempo de resposta de aplicações paralelas de processamento de dados massivos em ambientes de nuvem

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Tulio Braga Moreira Pinto
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
Brasil
ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
Programa de Pós-Graduação em Ciência da Computação
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/44157
Resumo: The popularity of online and data-intensive applications presented new challenges to computing. Although cloud computing technology has enabled on-demand resource scheduling, the data access heterogeneity and irregularity of data-intensive applications have increased the difficulty of both hardware and software resource scheduling. Nonetheless, the performance prediction (e.g.: response time) of such applications increase in complexity as all these characteristics are combined. Thus, this research explores two analytical models for the response time prediction of parallel applications running on Apache Spark, one of the most popular frameworks for massive data-processing. The first model is based on a fork/join queues, in which an application is split into N tasks and processed in parallel in multiple servers. This model captures the synchronization delays perceived in the slowest server. The second model is based on queuing networks. It considers the precedence relationship between the application tasks to compute the synchronization delays. Multiple experimental scenarios were considered, including the parallel wordcount algorithm, machine learning common algorithms, such as SVM, Logistic Regression, and K-Means, and ad-hoc data analytics queries. The precedence relationship model presented a mean error less than 20% for most of the experimental scenarios, which is typically considered reasonable for analytical models. Yet, both models presented execution times in the range of milliseconds. Such a low execution time enables the usage of the models for the dynamic provisioning of parallel systems, an important task to guarantee the quality of service of massive data-processing applications. Both the analytical models were compared to the DagSim simulation model, the state-of-art model for performance prediction of Hadoop and Spark applications.