A framework for efficient execution of data parallel irregular applications on heterogeneous systems
Main Author: | |
---|---|
Publication Date: | 2015 |
Other Authors: | , |
Format: | Article |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/1822/29287 |
Summary: | Exploiting the computing power of the diversity of resources available on heterogeneous systems is mandatory but a very challenging task. The diversity of architectures, execution models and programming tools, together with disjoint address spaces and di erent computing capabilities, raise a number of challenges that severely impact on application performance and programming productivity. This problem is further compounded in the presence of data parallel irregular applications. This paper presents a framework that addresses development and execution of data parallel irregular applications in heterogeneous systems. A uni ed task-based programming and execution model is proposed, together with inter and intra-device scheduling, which, coupled with a data management system, aim to achieve performance scalability across multiple devices, while maintaining high programming productivity. Intradevice scheduling on wide SIMD/SIMT architectures resorts to consumer-producer kernels, which, by allowing dynamic generation and rescheduling of new work units, enable balancing irregular workloads and increase resource utilization. Results show that regular and irregular applications scale well with the number of devices, while requiring minimal programming e ort. Consumer-producer kernels are able to sustain signi cant performance gains as long as the workload per basic work unit is enough to compensate overheads associated with intra-device scheduling. This not being the case, consumer kernels can still be used for the irregular application. Comparisons with an alternative framework, StarPU, which targets regular workloads, consistently demonstrate signi cant speedups. This is, to the best of our knowledge, the rst published integrated approach that successfully handles irregular workloads over heterogeneous systems. |
id |
RCAP_67df41f7ab73872595d78a6c88f106cd |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/29287 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
A framework for efficient execution of data parallel irregular applications on heterogeneous systemsHeterogeneous systemsIrregular applicationsEfficiencyProgramming productivityExploiting the computing power of the diversity of resources available on heterogeneous systems is mandatory but a very challenging task. The diversity of architectures, execution models and programming tools, together with disjoint address spaces and di erent computing capabilities, raise a number of challenges that severely impact on application performance and programming productivity. This problem is further compounded in the presence of data parallel irregular applications. This paper presents a framework that addresses development and execution of data parallel irregular applications in heterogeneous systems. A uni ed task-based programming and execution model is proposed, together with inter and intra-device scheduling, which, coupled with a data management system, aim to achieve performance scalability across multiple devices, while maintaining high programming productivity. Intradevice scheduling on wide SIMD/SIMT architectures resorts to consumer-producer kernels, which, by allowing dynamic generation and rescheduling of new work units, enable balancing irregular workloads and increase resource utilization. Results show that regular and irregular applications scale well with the number of devices, while requiring minimal programming e ort. Consumer-producer kernels are able to sustain signi cant performance gains as long as the workload per basic work unit is enough to compensate overheads associated with intra-device scheduling. This not being the case, consumer kernels can still be used for the irregular application. Comparisons with an alternative framework, StarPU, which targets regular workloads, consistently demonstrate signi cant speedups. This is, to the best of our knowledge, the rst published integrated approach that successfully handles irregular workloads over heterogeneous systems.This work is funded by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) and by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) within projects PEst-OE/EEI/UI0752/2014 and FCOMP-01-0124-FEDER-010067. Also by the School of Engineering, Universidade do Minho within project P2SHOCS - Performance Portability on Scalable Heterogeneous Computing Systems.World Scientific PublishingUniversidade do MinhoRibeiro, RobertoBarbosa, JoãoSantos, Luís Paulo20152015-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/29287engRoberto Ribeiro, João Barbosa, and Luís Paulo Santos, Parallel Process. Lett. 25, 15500040129-626410.1142/S0129626415500048info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T05:36:35Zoai:repositorium.sdum.uminho.pt:1822/29287Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T15:23:59.510034Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
A framework for efficient execution of data parallel irregular applications on heterogeneous systems |
title |
A framework for efficient execution of data parallel irregular applications on heterogeneous systems |
spellingShingle |
A framework for efficient execution of data parallel irregular applications on heterogeneous systems Ribeiro, Roberto Heterogeneous systems Irregular applications Efficiency Programming productivity |
title_short |
A framework for efficient execution of data parallel irregular applications on heterogeneous systems |
title_full |
A framework for efficient execution of data parallel irregular applications on heterogeneous systems |
title_fullStr |
A framework for efficient execution of data parallel irregular applications on heterogeneous systems |
title_full_unstemmed |
A framework for efficient execution of data parallel irregular applications on heterogeneous systems |
title_sort |
A framework for efficient execution of data parallel irregular applications on heterogeneous systems |
author |
Ribeiro, Roberto |
author_facet |
Ribeiro, Roberto Barbosa, João Santos, Luís Paulo |
author_role |
author |
author2 |
Barbosa, João Santos, Luís Paulo |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Ribeiro, Roberto Barbosa, João Santos, Luís Paulo |
dc.subject.por.fl_str_mv |
Heterogeneous systems Irregular applications Efficiency Programming productivity |
topic |
Heterogeneous systems Irregular applications Efficiency Programming productivity |
description |
Exploiting the computing power of the diversity of resources available on heterogeneous systems is mandatory but a very challenging task. The diversity of architectures, execution models and programming tools, together with disjoint address spaces and di erent computing capabilities, raise a number of challenges that severely impact on application performance and programming productivity. This problem is further compounded in the presence of data parallel irregular applications. This paper presents a framework that addresses development and execution of data parallel irregular applications in heterogeneous systems. A uni ed task-based programming and execution model is proposed, together with inter and intra-device scheduling, which, coupled with a data management system, aim to achieve performance scalability across multiple devices, while maintaining high programming productivity. Intradevice scheduling on wide SIMD/SIMT architectures resorts to consumer-producer kernels, which, by allowing dynamic generation and rescheduling of new work units, enable balancing irregular workloads and increase resource utilization. Results show that regular and irregular applications scale well with the number of devices, while requiring minimal programming e ort. Consumer-producer kernels are able to sustain signi cant performance gains as long as the workload per basic work unit is enough to compensate overheads associated with intra-device scheduling. This not being the case, consumer kernels can still be used for the irregular application. Comparisons with an alternative framework, StarPU, which targets regular workloads, consistently demonstrate signi cant speedups. This is, to the best of our knowledge, the rst published integrated approach that successfully handles irregular workloads over heterogeneous systems. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015 2015-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1822/29287 |
url |
http://hdl.handle.net/1822/29287 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Roberto Ribeiro, João Barbosa, and Luís Paulo Santos, Parallel Process. Lett. 25, 1550004 0129-6264 10.1142/S0129626415500048 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
World Scientific Publishing |
publisher.none.fl_str_mv |
World Scientific Publishing |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833595291249410048 |