Particionamento semi-automático de reduções cíclicas para execução em anthill
Ano de defesa: | 2006 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Minas Gerais
UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/RVMR-78VMDR |
Resumo: | Extracting information from large datasets is one of the challenging new demands in Computer Science. To accomplish that, several Data Mining techniques are being proposed continually in the literature. Such algorithms are computationally intensive, in addition to the fact of having to process very large input data. Besides, the new trend towards high performance computing is gearing towards clusters of computers, a cost effective alternative to the supercomputers of the past. This trend leads to a scenario in which achieving high performance must be accomplished by efficient parallel and distributed implementations of the applications. Anthill is a solution for distributed processing on clusters of workstations for which high performance have been achieved by efficient and scalable implementation of several Data Mining algorithms. In essence, it provides a framework for the development and execution of applications in distributed environments, where the applications must be decomposed into a set of filters that exchange information though streams in a pipeline fashion. The process of developing the applications, however, damands the programmers to perform such application decomposition which may not be the natural way in which they program. This work presents a tool of semiautomatic generation of such fiters from a basic sequential implementation of the algorithms. This process is divided in two stages: the first being the partition into filters of the sequential code, followed by the code generation for each of the identified filters. This work focuses mostly on the latter, the code generation itself. For the first stage, we rely on some semi-automatic steps that could be implemented to be fully automatic in future work. These steps are based on determining data dependencies within the code, and finding good partition places. Using such steps we generate an annotated version of the sequential code, that contains the partitioning information. The actual code generation is accomplished from the annotated code. Basically the annotations encompass a directed graph with vertices representing the filters and edges annotated with the data that should be communicated. We implemented a source-to-source compiler, where the initial sequential code is standard C, and the output is also C, with Anthill extensions. Most of the necessary adaptations for Anthill are generated automatically by our compiler. The compiler was validated using three different Data Mining applications that had previously been developed for Anthill. This time, we generated the Anthill code from sequential versions of the same algorithms. We evaluate the performance of the generated code and we observe that it is very similar to the hand-made implementations in most cases. This is a good result when noted that the effort to design the sequential code is much less than a fully parallel Anthill implementation. We also notice that there are some ad-hoc optimizations on the hand-made codes that could also be accomplished by a compiler in a further optimizing step. We plan to pursue such automatic optimizations as future work |