High-level Programming of Many-Core Architectures

Bibliographic Details
Main Author: Marques, Frederico Mariano de Almeida
Publication Date: 2016
Format: Master thesis
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10362/166874
Summary: Nowadays computational systems are heterogeneous typically containing CPUs and GPUs. Offloading computations to a GPU allows more computations per time unit, spe- cially when GPUs are more efficient than CPUs for some type of computations, like matrix multiplication. There is, nonetheless, a specific complexity associated with writing portable code, specially when dealing with the integration of the programming of GPUs in mainstream programming languages, avoiding low-level complex frameworks like CUDA and OpenCL. The proposals put forth to solve these questions can be divided into two fields: spe- cialized skeleton frameworks with reduced expressiveness, and languages (or language extensions) that follow a best-effort strategy. Nonetheless, efficient usage of these lan- guages requires the programmer reasons about code compatibility while considering the GPUs execution model and the limitations of the compilers themselves. The herein documented approach takes a different route, offering a programming model in which the programmer expresses the GPU computations by manipulating data structures in a way that is equivalent to the computations designed for CPU. This expres- siveness is achieved without prejudice to the guarantee that the expressed computations will run in GPU context. The methodology chosen to implement this approach rests on an expression template layer. This expression template layer generates both the Marrow framework code, to launch the OpenCL kernels, and the OpenCL kernels to be run. The generated code is reused in subsequent executions being that the generation only happens at the first time a computation is to be computed. From the tests developed there was a significant reduction of lines of code. The technique’s overhead has a reduced impact in the computation time, as shown by our results, being that the most noticeable feature is that the overhead is not proportional to the expressions complexity; our results point to an overhead of 9 to 10% in worst case scenario, and approaching 0% as the OpenCL computational time increases.
id RCAP_dc82b62b25b5243f1210113b6b4d52df
oai_identifier_str oai:run.unl.pt:10362/166874
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling High-level Programming of Many-Core ArchitecturesExpression templatesAlgorithmic skeletonsHeterogeneous ComputationMarrowDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaNowadays computational systems are heterogeneous typically containing CPUs and GPUs. Offloading computations to a GPU allows more computations per time unit, spe- cially when GPUs are more efficient than CPUs for some type of computations, like matrix multiplication. There is, nonetheless, a specific complexity associated with writing portable code, specially when dealing with the integration of the programming of GPUs in mainstream programming languages, avoiding low-level complex frameworks like CUDA and OpenCL. The proposals put forth to solve these questions can be divided into two fields: spe- cialized skeleton frameworks with reduced expressiveness, and languages (or language extensions) that follow a best-effort strategy. Nonetheless, efficient usage of these lan- guages requires the programmer reasons about code compatibility while considering the GPUs execution model and the limitations of the compilers themselves. The herein documented approach takes a different route, offering a programming model in which the programmer expresses the GPU computations by manipulating data structures in a way that is equivalent to the computations designed for CPU. This expres- siveness is achieved without prejudice to the guarantee that the expressed computations will run in GPU context. The methodology chosen to implement this approach rests on an expression template layer. This expression template layer generates both the Marrow framework code, to launch the OpenCL kernels, and the OpenCL kernels to be run. The generated code is reused in subsequent executions being that the generation only happens at the first time a computation is to be computed. From the tests developed there was a significant reduction of lines of code. The technique’s overhead has a reduced impact in the computation time, as shown by our results, being that the most noticeable feature is that the overhead is not proportional to the expressions complexity; our results point to an overhead of 9 to 10% in worst case scenario, and approaching 0% as the OpenCL computational time increases.Actualmente os sistemas computacionais são heterogéneos, tipicamente contendo CPUs e GPUs. Executar o offload de computações para o GPU permite executar mais computações por unidade de tempo, especialmente quando se sabe que os GPUs são mais eficientes que os CPUs para alguns tipos de computação tais como a multiplicação de matrizes. Para evitar a complexidade associada a frameworks de baixo nível, tal como CUDA e OpenCL, procura-se a integração da programação de GPUs em linguagens mainstream. Contudo a escrita de código portável, abstraindo dos detalhes da programação de GPU tem uma complexidade própria. As propostas feitas para solucionar estas questões podem ser divididas em dois cam- pos: bibliotecas de skeletons muito especializadas e, por desenho, de expressividade muito limitada e linguagens ou extensões de linguagens que adoptam uma filosofia de melhor esforço. No entanto, a utilização eficiente destas linguagens requer que o programador raciocine sobre a compatibilidade do seu código com o modelo de execução dos GPUS e sobre a limitações do próprios compiladores. A abordagem aqui documentada é diferente: oferece um modelo de programação em que o programador exprime as computações GPU manipulando estruturas de dados de forma equivalente às computações para CPU. Esta expressividade é atingida sem prejudicar a garantia que as computações são executadas em contexto GPU. Para a solução conseguir fazer o pretendido, utiliza-se a framework Marrow para executar os kerneis OpenCL. A metodologia para implementar a abordagem assenta numa camada de expression templates. A camada de expression templates gera o código de biblioteca e o código de backend, que no nosso caso mapeiam para código Marrow e código OpenCL. O código gerado é reutilizado de execuções anteriores, sendo que a geração ocorre na primeira execução da computação. Dos testes efectuados detectou-se uma redução das linhas de código, reduzindo o código em OpenCL e Marrow em 100%. O overhead da técnica escolhida tem um impacto reduzido, tal como comprovam os testes efectuados, sendo que o overhead é não-proporcional à complexidade da expressão avaliada; os resultados apontam para que o overhead tenha um peso entre 9% a 10% no pior caso, e tendendo para 0% à medida que a computação OpenCL consome mais tempo.Paulino, HervéRUNMarques, Frederico Mariano de Almeida2024-05-02T15:38:46Z2016-062016-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/166874enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:20:59Zoai:run.unl.pt:10362/166874Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:51:46.670340Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv High-level Programming of Many-Core Architectures
title High-level Programming of Many-Core Architectures
spellingShingle High-level Programming of Many-Core Architectures
Marques, Frederico Mariano de Almeida
Expression templates
Algorithmic skeletons
Heterogeneous Computation
Marrow
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short High-level Programming of Many-Core Architectures
title_full High-level Programming of Many-Core Architectures
title_fullStr High-level Programming of Many-Core Architectures
title_full_unstemmed High-level Programming of Many-Core Architectures
title_sort High-level Programming of Many-Core Architectures
author Marques, Frederico Mariano de Almeida
author_facet Marques, Frederico Mariano de Almeida
author_role author
dc.contributor.none.fl_str_mv Paulino, Hervé
RUN
dc.contributor.author.fl_str_mv Marques, Frederico Mariano de Almeida
dc.subject.por.fl_str_mv Expression templates
Algorithmic skeletons
Heterogeneous Computation
Marrow
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Expression templates
Algorithmic skeletons
Heterogeneous Computation
Marrow
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Nowadays computational systems are heterogeneous typically containing CPUs and GPUs. Offloading computations to a GPU allows more computations per time unit, spe- cially when GPUs are more efficient than CPUs for some type of computations, like matrix multiplication. There is, nonetheless, a specific complexity associated with writing portable code, specially when dealing with the integration of the programming of GPUs in mainstream programming languages, avoiding low-level complex frameworks like CUDA and OpenCL. The proposals put forth to solve these questions can be divided into two fields: spe- cialized skeleton frameworks with reduced expressiveness, and languages (or language extensions) that follow a best-effort strategy. Nonetheless, efficient usage of these lan- guages requires the programmer reasons about code compatibility while considering the GPUs execution model and the limitations of the compilers themselves. The herein documented approach takes a different route, offering a programming model in which the programmer expresses the GPU computations by manipulating data structures in a way that is equivalent to the computations designed for CPU. This expres- siveness is achieved without prejudice to the guarantee that the expressed computations will run in GPU context. The methodology chosen to implement this approach rests on an expression template layer. This expression template layer generates both the Marrow framework code, to launch the OpenCL kernels, and the OpenCL kernels to be run. The generated code is reused in subsequent executions being that the generation only happens at the first time a computation is to be computed. From the tests developed there was a significant reduction of lines of code. The technique’s overhead has a reduced impact in the computation time, as shown by our results, being that the most noticeable feature is that the overhead is not proportional to the expressions complexity; our results point to an overhead of 9 to 10% in worst case scenario, and approaching 0% as the OpenCL computational time increases.
publishDate 2016
dc.date.none.fl_str_mv 2016-06
2016-06-01T00:00:00Z
2024-05-02T15:38:46Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/166874
url http://hdl.handle.net/10362/166874
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833597016852135936