Colocação Automática de Computação em Hardware Heterogêneo
Ano de defesa: | 2016 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Minas Gerais
UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/ESBF-AE2GAF |
Resumo: | Graphics processing Units (GPUs) have revolutionized high performance programming.They reduced the cost of parallel hardware, however, programming these devices is still a challenge. Programmers are not able (yet) to write code to coordinate the simultaneousperformance of thousands of threads. To deal with this problem, the industry and the academia have introduced annotation systems. Examples of those systems are OpenMP 4.0, OpenSS and OpenACC, which allow developers to indicate which parts of a C or Fortran program should perform on GPU or CPU. This approach has two advantages. First, it lets programmers to obtain the benefits of the parallel hardwarewhile coding in their preferred programming languages. Second, the annotations protect programmers from details of the parallel hardware, once they pass the task of parallelizing programs for the code generator.The inclusion of pragmas in the code to hide details of the hardware does not solve all the problems that developers face when programming GPUs: they still need to identify when it will be advantageous to run a given piece of code on the GPU. In this context, the objective of this dissertation is to present a solution to such a problem. It was designed, implemented and tested techniques to automatically identify whichportions of the code must run on the GPU. For this, we used dependency information, memory layout and control flow. Were created a set of static analysis that performs three tasks: (i) identify which loops are parallelizable; (ii) insert annotations to copy data between the CPU and the GPU; (iii) estimate which loops, once tagged as parallels, are most likely to lead to performance gains. These tasks are totally automatic, are carried out without any user intervention. The platform that is presented has been implemented on two compilers. The analyses were built on top of the infrastructure available in LLVM. The parallel code generation, from annotated programs, is made by PGCC. The approach that we have developed is completely static: we decide where each function must run during the compilation of the program. This decision does not relyon any runtime system such as a middleware, or special computer architecture hooks. Another benefit of this framework is that it is completely automatic, i.e. it does not require any intervention from the programmer. As a result, programs that we produce automatically - can be up to 121x faster than their original versions. |