Extending, improving and optimizing Marrow

Bibliographic Details
Main Author: Cardoso, Francisco José Sampaio de Freitas
Publication Date: 2022
Format: Master thesis
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10362/166574
Summary: Most computers nowadays are heterogeneous, composed of a Central Processing Unit (CPU) and one or more Graphics Processing Units (GPUs). In order to harness the power of each of these devices, developers must have experience with low-level toolchains such as CUDA, and expert knowledge of the underlying architecture. However these low-level approaches add several layers of complexity to the task at hand. High-level programming models such as the Marrow framework are used to attenuate the arduous task that is offloading computation to accelerator devices. Usually, they do so by abstracting memory management and implicitly parallelizing workloads by exposing high-level constructs to the programmer. However, these frameworks come with several limitations and it isn’t always possible to maximize performance as this might require writing specific code to map computation to a device. In this thesis we ported several programs implemented in other frameworks and plat- forms to the Marrow framework, which allowed us to better understand its limitations and further extend and optimize the framework. An iterative process was used, where we started by analyzing how a given program was implemented on a given framework, secondly we investigated if the program could be implemented in Marrow’s current state. If not, we extended Marrow by improving its features, in order to make the implementa- tion possible. Then we implemented and benchmarked the given program, and used the performance comparisons as a tool to further optimize the framework. With the development of this thesis we managed to implement several applications with the Marrow framework, which allowed us to add several new features such as the inclusive scan, matrix multiplication operation, the zip and unzip functions, and we significantly improved the flexibility of Marrow’s constructs such as that of Marrow’s exclusive scan. Furthermore, we managed to better understand Marrow’s performance bottlenecks through the Marrow profiler, and optimize asynchronous memory transfers.
id RCAP_5b1ccc5767d1f7be12bd894e64baf64a
oai_identifier_str oai:run.unl.pt:10362/166574
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Extending, improving and optimizing MarrowHeterogeneous ComputingMarrowCUDAGPUDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaMost computers nowadays are heterogeneous, composed of a Central Processing Unit (CPU) and one or more Graphics Processing Units (GPUs). In order to harness the power of each of these devices, developers must have experience with low-level toolchains such as CUDA, and expert knowledge of the underlying architecture. However these low-level approaches add several layers of complexity to the task at hand. High-level programming models such as the Marrow framework are used to attenuate the arduous task that is offloading computation to accelerator devices. Usually, they do so by abstracting memory management and implicitly parallelizing workloads by exposing high-level constructs to the programmer. However, these frameworks come with several limitations and it isn’t always possible to maximize performance as this might require writing specific code to map computation to a device. In this thesis we ported several programs implemented in other frameworks and plat- forms to the Marrow framework, which allowed us to better understand its limitations and further extend and optimize the framework. An iterative process was used, where we started by analyzing how a given program was implemented on a given framework, secondly we investigated if the program could be implemented in Marrow’s current state. If not, we extended Marrow by improving its features, in order to make the implementa- tion possible. Then we implemented and benchmarked the given program, and used the performance comparisons as a tool to further optimize the framework. With the development of this thesis we managed to implement several applications with the Marrow framework, which allowed us to add several new features such as the inclusive scan, matrix multiplication operation, the zip and unzip functions, and we significantly improved the flexibility of Marrow’s constructs such as that of Marrow’s exclusive scan. Furthermore, we managed to better understand Marrow’s performance bottlenecks through the Marrow profiler, and optimize asynchronous memory transfers.A maioria dos computadores hoje em dia são heterogéneos, compostos por um CPU e uma ou mais GPUs. Para aproveitar o potencial destes dispositivos, os developers têm de ter experiência com modelos de programação de baixo nível como o CUDA e conhecimento aprofundado da arquitetura subjacente. No entanto, estas abordagens de mais baixo nível adicionam vários níveis de complexidade à implementação de uma determinada tarefa. Modelos de programação de alto nível como a ferramenta Marrow são utilizados para atenuar a difícil tarefa de transferir computação para aceleradores. Geralmente, estas abs- traem a gestão de memória e paralelizam implicitamente os workloads expondo constructs de alto nível ao programador. No entanto, estas abordagens têm diversas limitações e nem sempre é possível maximizar o desempenho porque este pode requerer o desenvolvimento de código específico que mapeia computação ao device. Nesta tese implementámos diversos programas já implementados noutras ferramen- tas e plataformas para o Marrow, o que nos permitiu perceber melhor as suas limitações e desenvolver e optimizar a ferramenta. Um processo iterativo foi utilizado, em que come- çámos por analisar como é que um certo programa foi implementado numa ferramenta, de seguida, investigámos se era possível implementá-lo no estado atual do Marrow. Se não, melhorámos o Marrow de modo a que fosse possível esta implementação. Por fim, implementámos o programa e procedemos à sua avaliação de desempenho de modo a optimizar a ferramenta. Com o desenvolvimento desta tese conseguimos implementar várias aplicações com o Marrow, o que nos permitiu adicionar múltiplas funcionalidades como o scan inclu- sivo, a operação de multiplicação de matrizes, e as funções de zip e unzip, bem como conseguimos melhorar significativamente a flexibilidade do scan exclusivo do Marrow. Adicionalmente obtivemos um melhor conhecimento das limitações de desempenho atra- vés do profiler do Marrow, e optimizámos as transferências assíncronas de memória.Paulino, HervéRUNCardoso, Francisco José Sampaio de Freitas2024-04-24T10:59:14Z2022-122022-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/166574enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:20:42Zoai:run.unl.pt:10362/166574Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:51:18.627699Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Extending, improving and optimizing Marrow
title Extending, improving and optimizing Marrow
spellingShingle Extending, improving and optimizing Marrow
Cardoso, Francisco José Sampaio de Freitas
Heterogeneous Computing
Marrow
CUDA
GPU
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Extending, improving and optimizing Marrow
title_full Extending, improving and optimizing Marrow
title_fullStr Extending, improving and optimizing Marrow
title_full_unstemmed Extending, improving and optimizing Marrow
title_sort Extending, improving and optimizing Marrow
author Cardoso, Francisco José Sampaio de Freitas
author_facet Cardoso, Francisco José Sampaio de Freitas
author_role author
dc.contributor.none.fl_str_mv Paulino, Hervé
RUN
dc.contributor.author.fl_str_mv Cardoso, Francisco José Sampaio de Freitas
dc.subject.por.fl_str_mv Heterogeneous Computing
Marrow
CUDA
GPU
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Heterogeneous Computing
Marrow
CUDA
GPU
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Most computers nowadays are heterogeneous, composed of a Central Processing Unit (CPU) and one or more Graphics Processing Units (GPUs). In order to harness the power of each of these devices, developers must have experience with low-level toolchains such as CUDA, and expert knowledge of the underlying architecture. However these low-level approaches add several layers of complexity to the task at hand. High-level programming models such as the Marrow framework are used to attenuate the arduous task that is offloading computation to accelerator devices. Usually, they do so by abstracting memory management and implicitly parallelizing workloads by exposing high-level constructs to the programmer. However, these frameworks come with several limitations and it isn’t always possible to maximize performance as this might require writing specific code to map computation to a device. In this thesis we ported several programs implemented in other frameworks and plat- forms to the Marrow framework, which allowed us to better understand its limitations and further extend and optimize the framework. An iterative process was used, where we started by analyzing how a given program was implemented on a given framework, secondly we investigated if the program could be implemented in Marrow’s current state. If not, we extended Marrow by improving its features, in order to make the implementa- tion possible. Then we implemented and benchmarked the given program, and used the performance comparisons as a tool to further optimize the framework. With the development of this thesis we managed to implement several applications with the Marrow framework, which allowed us to add several new features such as the inclusive scan, matrix multiplication operation, the zip and unzip functions, and we significantly improved the flexibility of Marrow’s constructs such as that of Marrow’s exclusive scan. Furthermore, we managed to better understand Marrow’s performance bottlenecks through the Marrow profiler, and optimize asynchronous memory transfers.
publishDate 2022
dc.date.none.fl_str_mv 2022-12
2022-12-01T00:00:00Z
2024-04-24T10:59:14Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/166574
url http://hdl.handle.net/10362/166574
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833597015275077632