Colocação Automática de Computação em Hardware Heterogêneo

Kezia Correa Andrade

Colocação Automática de Computação em Hardware Heterogêneo

Detalhes bibliográficos
Ano de defesa:	2016
Autor(a) principal:	Kezia Correa Andrade
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Minas Gerais UFMG
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Compiladores Análise Estática Linguagens de Programação Paralelismo GPGPU Linguagem de programação (Computação) Compiladores (Computadores) Computação Análise estática
Link de acesso:	http://hdl.handle.net/1843/ESBF-AE2GAF
Resumo:	Graphics processing Units (GPUs) have revolutionized high performance programming.They reduced the cost of parallel hardware, however, programming these devices is still a challenge. Programmers are not able (yet) to write code to coordinate the simultaneousperformance of thousands of threads. To deal with this problem, the industry and the academia have introduced annotation systems. Examples of those systems are OpenMP 4.0, OpenSS and OpenACC, which allow developers to indicate which parts of a C or Fortran program should perform on GPU or CPU. This approach has two advantages. First, it lets programmers to obtain the benefits of the parallel hardwarewhile coding in their preferred programming languages. Second, the annotations protect programmers from details of the parallel hardware, once they pass the task of parallelizing programs for the code generator.The inclusion of pragmas in the code to hide details of the hardware does not solve all the problems that developers face when programming GPUs: they still need to identify when it will be advantageous to run a given piece of code on the GPU. In this context, the objective of this dissertation is to present a solution to such a problem. It was designed, implemented and tested techniques to automatically identify whichportions of the code must run on the GPU. For this, we used dependency information, memory layout and control flow. Were created a set of static analysis that performs three tasks: (i) identify which loops are parallelizable; (ii) insert annotations to copy data between the CPU and the GPU; (iii) estimate which loops, once tagged as parallels, are most likely to lead to performance gains. These tasks are totally automatic, are carried out without any user intervention. The platform that is presented has been implemented on two compilers. The analyses were built on top of the infrastructure available in LLVM. The parallel code generation, from annotated programs, is made by PGCC. The approach that we have developed is completely static: we decide where each function must run during the compilation of the program. This decision does not relyon any runtime system such as a middleware, or special computer architecture hooks. Another benefit of this framework is that it is completely automatic, i.e. it does not require any intervention from the programmer. As a result, programs that we produce automatically - can be up to 121x faster than their original versions.

Colocação Automática de Computação em Hardware Heterogêneo

Registros relacionados