Divergência em GPU: análises e alocação de registradores

Diogo Nunes Sampaio

Divergência em GPU: análises e alocação de registradores

Detalhes bibliográficos
Ano de defesa:	2013
Autor(a) principal:	Diogo Nunes Sampaio
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Minas Gerais UFMG
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	AnÁlise estática Alocação de registradores Compiladores Rematerialização Linguagem de programação SIMT SIMD Divergências GPU Linguagem de programação (Computadores) Computação Compiladores (Programas de computador)
Link de acesso:	http://hdl.handle.net/1843/ESBF-97GJKT
Resumo:	The use of graphics processing units (GPUs) for accelerating Data Parallel workloads is the new trend on the computing market. This growing interest brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programmingthem is stil challenging. In particular, developers must deal with memory and control flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever processing elements (PEs) that run in lockstep see the same variable name holding different values. To deal with divergences this work introduces a new code analysis, called Divergence Analysis with Affine Constraints. Application developers and compilers can benefit from the information generated by this analysis with two different objectives. First, to improve code generate to machines that have vector instructions but cannot handle control divergence. Second, to optimize GPU code. To illustrate the last one, we present register allocators that relly on divergenceinformation to better use GPU memory hierarchy. These optimized allocators produced GPU code that is 29.70% faster than the code produced by a conventional allocator when tested on a suite of well-known benchmarks.

Divergência em GPU: análises e alocação de registradores

Registros relacionados