A CUDA backend for Marrow and its Optimisation via Machine Learning

Valente, Pedro de Almeida Amaral Ramos

A CUDA backend for Marrow and its Optimisation via Machine Learning

Bibliographic Details
Main Author:	Valente, Pedro de Almeida Amaral Ramos
Publication Date:	2022
Format:	Master thesis
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	http://hdl.handle.net/10362/155512
Summary:	In the modern days, various industries like business and science deal with collecting, processing and storing massive amounts of data. Conventional CPUs, which are optimised for sequential performance, struggle to keep up with processing so much data, however GPUs, designed for parallel computations, are more than up for the task. Using GPUs for general processing has become more popular in recent years due to the need for fast parallel processing, but developing programs that execute on the GPU can be difficult and time consuming. Various high-level APIs that compile into GPU programs exist, however due to the abstraction of lower level concepts and lack of algorithm specific optimisations, it may not be possible to reach peak performance. Optimisation specifically is an interesting problem, optimisation patterns very rarely can be applied uniformly to different algorithms and manually tuning individual programs is extremely time consuming. Machine learning compilation is a concept that has gained some attention in recent years, with good reason. The idea is to have a model trained using a machine learning algorithm and have it make an estimate on how to optimise an input program. Predicting the best optimisations for a program is much faster than doing it manually, in works making use of this technique, it has shown to also provide even better optimisations. In this thesis, we will be working with the Marrow framework and develop a CUDA based backend for it, so that low-level GPU code may be generated. Additionally, we will be training a machine learning model and use it to automatically optimise the CUDA code generated from Marrow programs.

Item metadata

id	RCAP_62a7cd75e7ba738d375272e254f48850
oai_identifier_str	oai:run.unl.pt:10362/155512
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	A CUDA backend for Marrow and its Optimisation via Machine Learningcode generationMarrowGPUCUDAmachine learningcompilationDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaIn the modern days, various industries like business and science deal with collecting, processing and storing massive amounts of data. Conventional CPUs, which are optimised for sequential performance, struggle to keep up with processing so much data, however GPUs, designed for parallel computations, are more than up for the task. Using GPUs for general processing has become more popular in recent years due to the need for fast parallel processing, but developing programs that execute on the GPU can be difficult and time consuming. Various high-level APIs that compile into GPU programs exist, however due to the abstraction of lower level concepts and lack of algorithm specific optimisations, it may not be possible to reach peak performance. Optimisation specifically is an interesting problem, optimisation patterns very rarely can be applied uniformly to different algorithms and manually tuning individual programs is extremely time consuming. Machine learning compilation is a concept that has gained some attention in recent years, with good reason. The idea is to have a model trained using a machine learning algorithm and have it make an estimate on how to optimise an input program. Predicting the best optimisations for a program is much faster than doing it manually, in works making use of this technique, it has shown to also provide even better optimisations. In this thesis, we will be working with the Marrow framework and develop a CUDA based backend for it, so that low-level GPU code may be generated. Additionally, we will be training a machine learning model and use it to automatically optimise the CUDA code generated from Marrow programs.Hoje em dia, várias indústrias como negócios e ciência lidam com a coleção, processamento e armazenamento de enormes quantidades de dados. CPUs convencionais, que são otimizados para processarem sequencialmente, têm dificuldade a processar tantos dados eficientemente, no entanto, GPUs que são desenhados para efetuarem computações paralelas, são mais que adequados para a tarefa. Usar GPUs para computações genéricas tem-se tornado mais comum em anos recentes devído à necessidade de processamento paralelo rápido, mas desenvolver programas que executam na GPU pode ser bastante díficil e demorar demasiado tempo. Existem várias APIs de alto nível que compilem para a GPU, mas devído à abstração de conceitos de baixo nível e à falta de otimizações específicas para algoritmos, pode ser impossível obter o máximo de efficiência. É interessante o problema de otimização, pois na maior parte dos casos é impossível aplicar padróes de otimização uniformemente em diferentes algoritmos e encontrar a melhor maneira de otimizar um programa manualmente demora bastante tempo. Compilação usando aprendizagem automática é um conceito que tem ficado mais popular em tempos recentes, e por boas razões. A ideia consiste em ter um modelo treinado através com um algoritmo de aprendizagem automática e usa-lo para ter uma estimativa das melhor otimizações que se podem aplicar a um dado programa. Prever as melhores otimizações com um modelo é muito mais rápido que o processo manual, e trabalhos que usam esta técnica demonstram obter otmizações ainda melhores. Nesta tese, vamos trabalhar com a framework Marrow e desevolver uma backend de CUDA para a mesma, de forma a que esta possa gerar código de baixo nível para a GPU. Para além disso, vamos treinar um modelo de aprendizagem automática e usa-lo para otimizar código CUDA gerado a partir de programas do Marrow automáticamente.Paulino, HervéRUNValente, Pedro de Almeida Amaral Ramos2023-07-19T10:28:39Z2022-062022-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/155512enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:13:08Zoai:run.unl.pt:10362/155512Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:43:42.236359Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	A CUDA backend for Marrow and its Optimisation via Machine Learning
title	A CUDA backend for Marrow and its Optimisation via Machine Learning
spellingShingle	A CUDA backend for Marrow and its Optimisation via Machine Learning Valente, Pedro de Almeida Amaral Ramos code generation Marrow GPU CUDA machine learning compilation Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short	A CUDA backend for Marrow and its Optimisation via Machine Learning
title_full	A CUDA backend for Marrow and its Optimisation via Machine Learning
title_fullStr	A CUDA backend for Marrow and its Optimisation via Machine Learning
title_full_unstemmed	A CUDA backend for Marrow and its Optimisation via Machine Learning
title_sort	A CUDA backend for Marrow and its Optimisation via Machine Learning
author	Valente, Pedro de Almeida Amaral Ramos
author_facet	Valente, Pedro de Almeida Amaral Ramos
author_role	author
dc.contributor.none.fl_str_mv	Paulino, Hervé RUN
dc.contributor.author.fl_str_mv	Valente, Pedro de Almeida Amaral Ramos
dc.subject.por.fl_str_mv	code generation Marrow GPU CUDA machine learning compilation Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic	code generation Marrow GPU CUDA machine learning compilation Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description	In the modern days, various industries like business and science deal with collecting, processing and storing massive amounts of data. Conventional CPUs, which are optimised for sequential performance, struggle to keep up with processing so much data, however GPUs, designed for parallel computations, are more than up for the task. Using GPUs for general processing has become more popular in recent years due to the need for fast parallel processing, but developing programs that execute on the GPU can be difficult and time consuming. Various high-level APIs that compile into GPU programs exist, however due to the abstraction of lower level concepts and lack of algorithm specific optimisations, it may not be possible to reach peak performance. Optimisation specifically is an interesting problem, optimisation patterns very rarely can be applied uniformly to different algorithms and manually tuning individual programs is extremely time consuming. Machine learning compilation is a concept that has gained some attention in recent years, with good reason. The idea is to have a model trained using a machine learning algorithm and have it make an estimate on how to optimise an input program. Predicting the best optimisations for a program is much faster than doing it manually, in works making use of this technique, it has shown to also provide even better optimisations. In this thesis, we will be working with the Marrow framework and develop a CUDA based backend for it, so that low-level GPU code may be generated. Additionally, we will be training a machine learning model and use it to automatically optimise the CUDA code generated from Marrow programs.
publishDate	2022
dc.date.none.fl_str_mv	2022-06 2022-06-01T00:00:00Z 2023-07-19T10:28:39Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10362/155512
url	http://hdl.handle.net/10362/155512
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833596920076959744

A CUDA backend for Marrow and its Optimisation via Machine Learning

Similar Items