Um mecanismo de persistência para um ambiente de processamento de fluxos de dados

Ana Paula de Carvalho

Um mecanismo de persistência para um ambiente de processamento de fluxos de dados

Detalhes bibliográficos
Ano de defesa:	2012
Autor(a) principal:	Ana Paula de Carvalho
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Minas Gerais UFMG
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Processamento de fluxos de dados Persistência de fluxos de dados Computação de alto desempenho Computação
Link de acesso:	http://hdl.handle.net/1843/ESBF-8XFK9G
Resumo:	The continuous evolution of technology in several areas of knowledge shows that increasing volumes of data are available. So today there exist, more than ever, a real demand for applications able to process large bodies of data. In general, these applications need to run at a high performance, intensive computationally algorithms that process data streams. Several of these applications also require that data streams are persisted, mainly for the following reasons: i) to enable the tracing of the transformations performed in the data, ii) to allow that the data analyzed in the future and iii) to reprocess the data in case of failure. The overall objective of this work is to contribute in the design and implementation of Watershed, a high-performance execution environment that provides abstractions for the development of distributed applications that process massive data streams. With that aim, we propose in this dissertation a data streams persistence mechanism that can be coupled to the Watershed. The execution environment implements the filter-stream programming model, so each application is decomposed into processing modules that communicate through channels called streams. Some features differ the Watershed from most environments/systems described in the literature, such as: support to the development and implementation of applications with dynamic topology, support the simultaneous execution of multiple applications and the possibility of intermediate results shared between among applications. The persistence mechanism proposed makes the Watershed a more general and flexible environment, since it enables that processing modules run at different time periods, having all data previously produced available for consumption, in addition to the current data. The mechanism is also distributed, it provides transparency in data storage, it supports semi-structured data handling and it provides resources for which a processing module filter from a stream uses only the units data, current or historical(stored), of its interest. In the experiments the impact of the persistence mechanism in the execution time of the applications was up to 13%.

Um mecanismo de persistência para um ambiente de processamento de fluxos de dados

Registros relacionados