High-level Programming of Many-Core Architectures

Marques, Frederico Mariano de Almeida

High-level Programming of Many-Core Architectures

Bibliographic Details
Main Author:	Marques, Frederico Mariano de Almeida
Publication Date:	2016
Format:	Master thesis
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	http://hdl.handle.net/10362/166874
Summary:	Nowadays computational systems are heterogeneous typically containing CPUs and GPUs. Offloading computations to a GPU allows more computations per time unit, spe- cially when GPUs are more efficient than CPUs for some type of computations, like matrix multiplication. There is, nonetheless, a specific complexity associated with writing portable code, specially when dealing with the integration of the programming of GPUs in mainstream programming languages, avoiding low-level complex frameworks like CUDA and OpenCL. The proposals put forth to solve these questions can be divided into two fields: spe- cialized skeleton frameworks with reduced expressiveness, and languages (or language extensions) that follow a best-effort strategy. Nonetheless, efficient usage of these lan- guages requires the programmer reasons about code compatibility while considering the GPUs execution model and the limitations of the compilers themselves. The herein documented approach takes a different route, offering a programming model in which the programmer expresses the GPU computations by manipulating data structures in a way that is equivalent to the computations designed for CPU. This expres- siveness is achieved without prejudice to the guarantee that the expressed computations will run in GPU context. The methodology chosen to implement this approach rests on an expression template layer. This expression template layer generates both the Marrow framework code, to launch the OpenCL kernels, and the OpenCL kernels to be run. The generated code is reused in subsequent executions being that the generation only happens at the first time a computation is to be computed. From the tests developed there was a significant reduction of lines of code. The technique’s overhead has a reduced impact in the computation time, as shown by our results, being that the most noticeable feature is that the overhead is not proportional to the expressions complexity; our results point to an overhead of 9 to 10% in worst case scenario, and approaching 0% as the OpenCL computational time increases.

Item metadata

id	RCAP_dc82b62b25b5243f1210113b6b4d52df
oai_identifier_str	oai:run.unl.pt:10362/166874
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	High-level Programming of Many-Core ArchitecturesExpression templatesAlgorithmic skeletonsHeterogeneous ComputationMarrowDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaNowadays computational systems are heterogeneous typically containing CPUs and GPUs. Offloading computations to a GPU allows more computations per time unit, spe- cially when GPUs are more efficient than CPUs for some type of computations, like matrix multiplication. There is, nonetheless, a specific complexity associated with writing portable code, specially when dealing with the integration of the programming of GPUs in mainstream programming languages, avoiding low-level complex frameworks like CUDA and OpenCL. The proposals put forth to solve these questions can be divided into two fields: spe- cialized skeleton frameworks with reduced expressiveness, and languages (or language extensions) that follow a best-effort strategy. Nonetheless, efficient usage of these lan- guages requires the programmer reasons about code compatibility while considering the GPUs execution model and the limitations of the compilers themselves. The herein documented approach takes a different route, offering a programming model in which the programmer expresses the GPU computations by manipulating data structures in a way that is equivalent to the computations designed for CPU. This expres- siveness is achieved without prejudice to the guarantee that the expressed computations will run in GPU context. The methodology chosen to implement this approach rests on an expression template layer. This expression template layer generates both the Marrow framework code, to launch the OpenCL kernels, and the OpenCL kernels to be run. The generated code is reused in subsequent executions being that the generation only happens at the first time a computation is to be computed. From the tests developed there was a significant reduction of lines of code. The technique’s overhead has a reduced impact in the computation time, as shown by our results, being that the most noticeable feature is that the overhead is not proportional to the expressions complexity; our results point to an overhead of 9 to 10% in worst case scenario, and approaching 0% as the OpenCL computational time increases.Actualmente os sistemas computacionais são heterogéneos, tipicamente contendo CPUs e GPUs. Executar o offload de computações para o GPU permite executar mais computações por unidade de tempo, especialmente quando se sabe que os GPUs são mais eficientes que os CPUs para alguns tipos de computação tais como a multiplicação de matrizes. Para evitar a complexidade associada a frameworks de baixo nível, tal como CUDA e OpenCL, procura-se a integração da programação de GPUs em linguagens mainstream. Contudo a escrita de código portável, abstraindo dos detalhes da programação de GPU tem uma complexidade própria. As propostas feitas para solucionar estas questões podem ser divididas em dois cam- pos: bibliotecas de skeletons muito especializadas e, por desenho, de expressividade muito limitada e linguagens ou extensões de linguagens que adoptam uma filosofia de melhor esforço. No entanto, a utilização eficiente destas linguagens requer que o programador raciocine sobre a compatibilidade do seu código com o modelo de execução dos GPUS e sobre a limitações do próprios compiladores. A abordagem aqui documentada é diferente: oferece um modelo de programação em que o programador exprime as computações GPU manipulando estruturas de dados de forma equivalente às computações para CPU. Esta expressividade é atingida sem prejudicar a garantia que as computações são executadas em contexto GPU. Para a solução conseguir fazer o pretendido, utiliza-se a framework Marrow para executar os kerneis OpenCL. A metodologia para implementar a abordagem assenta numa camada de expression templates. A camada de expression templates gera o código de biblioteca e o código de backend, que no nosso caso mapeiam para código Marrow e código OpenCL. O código gerado é reutilizado de execuções anteriores, sendo que a geração ocorre na primeira execução da computação. Dos testes efectuados detectou-se uma redução das linhas de código, reduzindo o código em OpenCL e Marrow em 100%. O overhead da técnica escolhida tem um impacto reduzido, tal como comprovam os testes efectuados, sendo que o overhead é não-proporcional à complexidade da expressão avaliada; os resultados apontam para que o overhead tenha um peso entre 9% a 10% no pior caso, e tendendo para 0% à medida que a computação OpenCL consome mais tempo.Paulino, HervéRUNMarques, Frederico Mariano de Almeida2024-05-02T15:38:46Z2016-062016-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/166874enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:20:59Zoai:run.unl.pt:10362/166874Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:51:46.670340Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	High-level Programming of Many-Core Architectures
title	High-level Programming of Many-Core Architectures
spellingShingle	High-level Programming of Many-Core Architectures Marques, Frederico Mariano de Almeida Expression templates Algorithmic skeletons Heterogeneous Computation Marrow Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short	High-level Programming of Many-Core Architectures
title_full	High-level Programming of Many-Core Architectures
title_fullStr	High-level Programming of Many-Core Architectures
title_full_unstemmed	High-level Programming of Many-Core Architectures
title_sort	High-level Programming of Many-Core Architectures
author	Marques, Frederico Mariano de Almeida
author_facet	Marques, Frederico Mariano de Almeida
author_role	author
dc.contributor.none.fl_str_mv	Paulino, Hervé RUN
dc.contributor.author.fl_str_mv	Marques, Frederico Mariano de Almeida
dc.subject.por.fl_str_mv	Expression templates Algorithmic skeletons Heterogeneous Computation Marrow Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic	Expression templates Algorithmic skeletons Heterogeneous Computation Marrow Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description	Nowadays computational systems are heterogeneous typically containing CPUs and GPUs. Offloading computations to a GPU allows more computations per time unit, spe- cially when GPUs are more efficient than CPUs for some type of computations, like matrix multiplication. There is, nonetheless, a specific complexity associated with writing portable code, specially when dealing with the integration of the programming of GPUs in mainstream programming languages, avoiding low-level complex frameworks like CUDA and OpenCL. The proposals put forth to solve these questions can be divided into two fields: spe- cialized skeleton frameworks with reduced expressiveness, and languages (or language extensions) that follow a best-effort strategy. Nonetheless, efficient usage of these lan- guages requires the programmer reasons about code compatibility while considering the GPUs execution model and the limitations of the compilers themselves. The herein documented approach takes a different route, offering a programming model in which the programmer expresses the GPU computations by manipulating data structures in a way that is equivalent to the computations designed for CPU. This expres- siveness is achieved without prejudice to the guarantee that the expressed computations will run in GPU context. The methodology chosen to implement this approach rests on an expression template layer. This expression template layer generates both the Marrow framework code, to launch the OpenCL kernels, and the OpenCL kernels to be run. The generated code is reused in subsequent executions being that the generation only happens at the first time a computation is to be computed. From the tests developed there was a significant reduction of lines of code. The technique’s overhead has a reduced impact in the computation time, as shown by our results, being that the most noticeable feature is that the overhead is not proportional to the expressions complexity; our results point to an overhead of 9 to 10% in worst case scenario, and approaching 0% as the OpenCL computational time increases.
publishDate	2016
dc.date.none.fl_str_mv	2016-06 2016-06-01T00:00:00Z 2024-05-02T15:38:46Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10362/166874
url	http://hdl.handle.net/10362/166874
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833597016852135936

High-level Programming of Many-Core Architectures

Similar Items