Adding native support for task scheduling to a Linux-capable RISC-V multicore system

Morais, Lucas Henrique

Adding native support for task scheduling to a Linux-capable RISC-V multicore system

Bibliographic Details
Main Author:	Morais, Lucas Henrique
Publication Date:	2019
Format:	Master thesis
Language:	eng
Source:	Biblioteca Digital de Teses e Dissertações da USP
Download full:	http://www.teses.usp.br/teses/disponiveis/45/45134/tde-28092019-060958/
Summary:	The Task Scheduling Paradigm is a general technique for leveraging fine and coarse grain parallelism from applications of several domains with minimum impact on code readability, relying on the automatic inference of data dependencies among tasks. The performance of Task Parallel applications is correlated with the speed at which the underlying Task Scheduling System is able to detect such dependencies, something that is critical for fine-granularity workloads, which cannot amortize scheduling overheads with long periods of useful computation. That being the case, several groups have recently been developing FPGA-accelerated Task Scheduling Systems architectures where a software Task Scheduling Runtime is able to offload its bookkeeping computations to an FPGA-based accelerator with the goal of efficiently scheduling fine-grained tasks to CPU cores. Even though these FPGA-accelerated systems offer substantial gains over the software-only baseline, it is also true that FPGA-CPU communication bottlenecks prevent such designs from handling scenarios with either large number of cores or very fine-grained tasks. With that in mind, we proposed the implementation of a Native Task Scheduling System that is, a processor with native support for task scheduling embedded into its architecture with the goal of substantially reducing these overheads. More specifically, this project aimed at embedding the HW logic of Picos, a mature Task Scheduling Accelerator developed by the Barcelona Supercomputing Center (BSC), into Rocket Chip, an open-source, silicon-proven, multi-core implementation of RISC-V. The ISA of the resulting system provides special instructions for Task Applications to interact with this Task Scheduling Logic, ruling out all FPGA-CPU communication latencies. To evaluate the prototype performance, we both (1) adapted Nanos, a mature Task Scheduling runtime, to benefit from the new task-scheduling-accelerating instructions; and (2) developed Phentos, a new HW-accelerated light weight Task Scheduling runtime. Our experiments show that task parallel programs using Nanos-RV the Nanos version ported to our system are on average 2.13 times faster than those being serviced by baseline Nanos, while programs running on Phentos are 13.19 times faster, considering geometric means. Using eight cores, Nanos-RV is able to deliver speedups with respect to serial execution of up to 5.62 times, while Phentos produces speedups of up to 5.72 times.

Item metadata

id	USP_c05190fdd8dc7e57a72d4a1c6b80fdbc
oai_identifier_str	oai:teses.usp.br:tde-28092019-060958
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	Adding native support for task scheduling to a Linux-capable RISC-V multicore systemAdicionando suporte nativo a paralelismo de tarefas a um sistema RISC-V multicore com suporte a LinuxChiselChiselParalelismo por tarefasParallel programmingProgramação paralelaRISC-VRISC-VRocket chipRocket chipTask schedulingThe Task Scheduling Paradigm is a general technique for leveraging fine and coarse grain parallelism from applications of several domains with minimum impact on code readability, relying on the automatic inference of data dependencies among tasks. The performance of Task Parallel applications is correlated with the speed at which the underlying Task Scheduling System is able to detect such dependencies, something that is critical for fine-granularity workloads, which cannot amortize scheduling overheads with long periods of useful computation. That being the case, several groups have recently been developing FPGA-accelerated Task Scheduling Systems architectures where a software Task Scheduling Runtime is able to offload its bookkeeping computations to an FPGA-based accelerator with the goal of efficiently scheduling fine-grained tasks to CPU cores. Even though these FPGA-accelerated systems offer substantial gains over the software-only baseline, it is also true that FPGA-CPU communication bottlenecks prevent such designs from handling scenarios with either large number of cores or very fine-grained tasks. With that in mind, we proposed the implementation of a Native Task Scheduling System that is, a processor with native support for task scheduling embedded into its architecture with the goal of substantially reducing these overheads. More specifically, this project aimed at embedding the HW logic of Picos, a mature Task Scheduling Accelerator developed by the Barcelona Supercomputing Center (BSC), into Rocket Chip, an open-source, silicon-proven, multi-core implementation of RISC-V. The ISA of the resulting system provides special instructions for Task Applications to interact with this Task Scheduling Logic, ruling out all FPGA-CPU communication latencies. To evaluate the prototype performance, we both (1) adapted Nanos, a mature Task Scheduling runtime, to benefit from the new task-scheduling-accelerating instructions; and (2) developed Phentos, a new HW-accelerated light weight Task Scheduling runtime. Our experiments show that task parallel programs using Nanos-RV the Nanos version ported to our system are on average 2.13 times faster than those being serviced by baseline Nanos, while programs running on Phentos are 13.19 times faster, considering geometric means. Using eight cores, Nanos-RV is able to deliver speedups with respect to serial execution of up to 5.62 times, while Phentos produces speedups of up to 5.72 times.Paralelismo por Tarefas é uma técnica genérica de extração de paralelismo de granularidade arbitrária aplicável a programas de vários domínios, com mínimo impacto sobre legibilidade de código, baseada na inferência automática de dependências de dados entre tarefas. O desempenho de aplicações paralelas baseadas nesse paradigma depende da velocidade com a qual o runtime de Paralelismo por Tarefas que lhe dá suporte é capaz de detectar tais dependências, fato que é ainda mais crítico para aplicações envolvendo tarefas de granularidade fina, já que nesse cenário o overhead de escalonamento não é amortizado por períodos significativamente maiores de computação útil. Recentemente, diversos grupos têm desenvolvido Sistemas de Suporte a Paralelismo por Tarefas acelerados por FPGAs, os quais são capazes de fazer offload das operações de inferência de dependências para um acelerador em FPGA de modo a melhorar o seu desempenho ao lidar com tarefas de granularidade fina. Por outro lado, ainda que esses sistemas acelerados por FPGA apresentem ganhos substanciais com relação às alternativas baseadas puramente em software, o desempenho dessas soluções é prejudicado por gargalos de comunicação entre a CPU e a FPGA, os quais limitam a capacidade desses sistemas de lidar com cenários envolvendo grande número de núcleos ou tarefas muito finas. Motivados por isso, implementamos um Sistema de Suporte Nativo a Paralelismo por Tarefas isto é, um processador com suporte arquitetural nativo a Paralelismo por Tarefas com o objetivo de reduzir consideravelmente tais overheads de comunicação. Mais especificamente, integramos a lógica em hardware do Picos, um acelerador de Paralelismo por Tarefas desenvolvido pelo Barcelona Supercomputing Center (BSC), ao Rocket Chip, uma implementação multi-core de código livre do RISC-V desenvolvida pela Universidade da Califórnia, Berkeley. O sistema resultante contém em sua ISA (Instruction Set Architecture) as instruções necessárias para que aplicações baseadas em tarefas possam interagir diretamente com essa lógica de escalonamento, minimizando os overheads associados ao uso de runtimes intermediários e eliminando toda a latência de comunicação FPGA-CPU. Para avaliar a performance do protótipo que então se construiu, nós tanto (1) adaptamos o runtime de escalonamento de tarefas Nanos para que ele pudesse ser acelerado pelas novas instruções de escalonamento de tarefas, quanto (2) criamos um novo runtime leve de escalonamento de tarefas a que demos o nome de Phentos. Nossos experimentos mostram que programas baseados em paralelismo por tarefas usando o runtime Nanos-RV a versão do runtime Nanos com suporte ao sistema que produzimos são executados em média 2,13 vezes mais rapidamente do que versões dos mesmos programas utilizando a versão básica do Nanos, enquanto programas executados com o Phentos são em média 13,19 vezes mais rápidos do que suas versões correspondentes baseadas na mesma versão básica do Nanos. Tais valores médios correspondem à média geométrica dos conjuntos de dados pertinentes. Usando oito núcleos, Nanos-RV entrega ganhos de desempenho com relação a execuções seriais de até 5,62 vezes, enquanto Phentos entrega ganhos de até 5,72 vezes.Biblioteca Digitais de Teses e Dissertações da USPAraújo, Guido Costa Souza deLejbman, Alfredo Goldman VelMorais, Lucas Henrique2019-08-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/45/45134/tde-28092019-060958/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2019-11-08T23:43:27Zoai:teses.usp.br:tde-28092019-060958Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212019-11-08T23:43:27Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Adding native support for task scheduling to a Linux-capable RISC-V multicore system Adicionando suporte nativo a paralelismo de tarefas a um sistema RISC-V multicore com suporte a Linux
title	Adding native support for task scheduling to a Linux-capable RISC-V multicore system
spellingShingle	Adding native support for task scheduling to a Linux-capable RISC-V multicore system Morais, Lucas Henrique Chisel Chisel Paralelismo por tarefas Parallel programming Programação paralela RISC-V RISC-V Rocket chip Rocket chip Task scheduling
title_short	Adding native support for task scheduling to a Linux-capable RISC-V multicore system
title_full	Adding native support for task scheduling to a Linux-capable RISC-V multicore system
title_fullStr	Adding native support for task scheduling to a Linux-capable RISC-V multicore system
title_full_unstemmed	Adding native support for task scheduling to a Linux-capable RISC-V multicore system
title_sort	Adding native support for task scheduling to a Linux-capable RISC-V multicore system
author	Morais, Lucas Henrique
author_facet	Morais, Lucas Henrique
author_role	author
dc.contributor.none.fl_str_mv	Araújo, Guido Costa Souza de Lejbman, Alfredo Goldman Vel
dc.contributor.author.fl_str_mv	Morais, Lucas Henrique
dc.subject.por.fl_str_mv	Chisel Chisel Paralelismo por tarefas Parallel programming Programação paralela RISC-V RISC-V Rocket chip Rocket chip Task scheduling
topic	Chisel Chisel Paralelismo por tarefas Parallel programming Programação paralela RISC-V RISC-V Rocket chip Rocket chip Task scheduling
description	The Task Scheduling Paradigm is a general technique for leveraging fine and coarse grain parallelism from applications of several domains with minimum impact on code readability, relying on the automatic inference of data dependencies among tasks. The performance of Task Parallel applications is correlated with the speed at which the underlying Task Scheduling System is able to detect such dependencies, something that is critical for fine-granularity workloads, which cannot amortize scheduling overheads with long periods of useful computation. That being the case, several groups have recently been developing FPGA-accelerated Task Scheduling Systems architectures where a software Task Scheduling Runtime is able to offload its bookkeeping computations to an FPGA-based accelerator with the goal of efficiently scheduling fine-grained tasks to CPU cores. Even though these FPGA-accelerated systems offer substantial gains over the software-only baseline, it is also true that FPGA-CPU communication bottlenecks prevent such designs from handling scenarios with either large number of cores or very fine-grained tasks. With that in mind, we proposed the implementation of a Native Task Scheduling System that is, a processor with native support for task scheduling embedded into its architecture with the goal of substantially reducing these overheads. More specifically, this project aimed at embedding the HW logic of Picos, a mature Task Scheduling Accelerator developed by the Barcelona Supercomputing Center (BSC), into Rocket Chip, an open-source, silicon-proven, multi-core implementation of RISC-V. The ISA of the resulting system provides special instructions for Task Applications to interact with this Task Scheduling Logic, ruling out all FPGA-CPU communication latencies. To evaluate the prototype performance, we both (1) adapted Nanos, a mature Task Scheduling runtime, to benefit from the new task-scheduling-accelerating instructions; and (2) developed Phentos, a new HW-accelerated light weight Task Scheduling runtime. Our experiments show that task parallel programs using Nanos-RV the Nanos version ported to our system are on average 2.13 times faster than those being serviced by baseline Nanos, while programs running on Phentos are 13.19 times faster, considering geometric means. Using eight cores, Nanos-RV is able to deliver speedups with respect to serial execution of up to 5.62 times, while Phentos produces speedups of up to 5.72 times.
publishDate	2019
dc.date.none.fl_str_mv	2019-08-22
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://www.teses.usp.br/teses/disponiveis/45/45134/tde-28092019-060958/
url	http://www.teses.usp.br/teses/disponiveis/45/45134/tde-28092019-060958/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1826319126693412864

Adding native support for task scheduling to a Linux-capable RISC-V multicore system

Similar Items