Aumentando os benefícios SIMD por meio de uma detecção de DLP em tempo de execução e energeticamente eficiente
Ano de defesa: | 2019 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Santa Maria
Brasil Ciência da Computação UFSM Programa de Pós-Graduação em Ciência da Computação Centro de Tecnologia |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://repositorio.ufsm.br/handle/1/16789 |
Resumo: | Multimedia applications have been widely present in embedded devices. Due to their intrinsic nature, such application domain is benefited from Data Level Parallelism (DLP). In order to improve performance-energy tradeoff, current processors enable DLP by coupling SIMD (Single Instruction Multiple Data) engines, such as Intel AVX, ARM NEON and IBM Altivec. Special libraries and compilers are used to support DLP execution on such engines. However, timing overhead on hand coding is inevitable since most software developers are not skilled to extract DLP using unfamiliar libraries. Considering the auto-vectorization through compiler, although improving software productivity, it breaks software compatibility. Besides, both methods are limited to static code analysis, which compromises performance gains. In this dissertation, we propose a runtime DLP detection named as Dynamic SIMD Assembler (DSA), which transparently identifies vectorizable code regions to execute in the ARM NEON engine. Due to its dynamic fashion, DSA keeps software compatibility and avoids timing overhead on software developing process. Results show that DSA outperforms ARM NEON auto-vectorization compiler by 32% since it applies the partial vectorization of loops and covers wider vectorizable regions, such as Dynamic Range, Sentinel and Conditional Loops. In addition, DSA outperforms hand-vectorized code using ARM library by 26% reducing 45% of energy consumption with no penalties over software development time. |