Identificação de dificuldades e questões de interesse de desenvolvedores de aplicações para Big Data com o framework Apache Spark

Albuquerque, Denis José Sousa de

Identificação de dificuldades e questões de interesse de desenvolvedores de aplicações para Big Data com o framework Apache Spark

Detalhes bibliográficos
Ano de defesa:	2019
Autor(a) principal:	Albuquerque, Denis José Sousa de
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Brasil UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM SISTEMAS E COMPUTAÇÃO
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Big Data Apache Spark Modelagem de tópicos probabilística Latent Dirichlet Allocation (LDA) Stack Overflow Taxonomia CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
Link de acesso:	https://repositorio.ufrn.br/jspui/handle/123456789/28122
Resumo:	This research aims to identify and classify the main difficulties and issues of interest of Apache Spark application developers regarding the framewok usage. For this purpose, we use the Latent Dirichlet Allocation algorithm to perform a probabilistic modeling of topics on information extracted from Stack Overflow, since the manual inspection of the entire dataset is not feasible. From the knowledge obtained by the comprehensive study of related works, we established and applied a methodology based on the practices usually employed. We developed Spark applications for the automated execution os tasks, such as the data selection and preparation, the discovery of topics - applying the probabilistic modeling algorithm with various configurations - and metrics computation. Analyzes of the results were carried by a group of 5 researchers: two doctor professors, one doctoral student and two master students. Based on the semantic analysis of the labels assigned to each of the identified topics, a taxonomy of interests and difficulties was constructed. Finally, we ranked the most important themes according to the various calculated metrics and compared the methods and results of our study with those presented in another work.

Identificação de dificuldades e questões de interesse de desenvolvedores de aplicações para Big Data com o framework Apache Spark

Registros relacionados