Watershed-ng: um sistema distribuído e extensível para o processamento de fluxos de dados
Ano de defesa: | 2015 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Minas Gerais
UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/ESBF-A2EQRM |
Resumo: | Most high-performance data processing (a.k.a. big-data) systems allowusers to express their computation using abstractions (like MapReduce)that simplify the extraction of parallelism from applications. Mostframeworks, however, do not allow users to specify how communicationmust take place: that element is deeply embedded into the run-timesystem (RTS) abstractions, making changes hard to implement.In this work we describe Wathershed-ng, our re-engineering of the Watershed system, a framework based on the filter-stream paradigm and originally focused on continuous stream processing. Like other big-data environments, Watershed provided object-oriented abstractions to express computation (filters), but the implementation of streams was an RTS element. By isolating stream functionality into appropriate classes, combination of communication patterns and reuse of common message handling functions (like compression and blocking) become possible. The new architecture even allows the design of new communication patterns, for example, allowing users to choose MPI, TCP or shared memory implementations of communication channels as their problem demands. Applications designed for the new interface showed reductions in code size on the order of 50% and above in some cases. The performance results also showed significant improvements, since some implementation bottlenecks were removed in the re-engineering process. |