Scheduling computations

Rito, Guilherme Miguel Teixeira

Scheduling computations

Bibliographic Details
Main Author:	Rito, Guilherme Miguel Teixeira
Publication Date:	2016
Format:	Master thesis
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	http://hdl.handle.net/10362/166873
Summary:	For quite some time, the Work Stealing algorithm has been the de facto standard for scheduling multithreaded computations. To ensure scalability and achieve high perfor- mance, work is scattered through processors. In turn, each processor owns a concurrent work queue that uses to keep track of its assigned tasks. When a processor’s work queue becomes empty, it becomes a thief and starts targeting victims uniformly at random, from which it attempts stealing tasks. This strategy was proved to be efficient in both theory and practice, and is currently used in state-of-the-art Work Stealing algorithms. Nevertheless, purely receiver initiated load balancing schemes, such as Work Steal- ing’s, are known not to be suitable for scheduling computations with few or unbalanced parallelism. More, due to the concurrent nature of work queues, even local operations require memory fences that are extremely expensive on modern computer architectures. Consequently, even when a processor is busy, it may incur in costly overheads caused by local accesses to its work queue. Finally, as the scheduler’s load balancer relies on ran- dom steals, its performance when executing memory bound computations is very limited. Despite all efforts, no silver-bullet has been found, and, even worse, all these limitations still exist in state-of-the-art Work Stealing algorithms. In this thesis we make three major theoretical contributions, addressing each of the aforementioned limitations. First, we prove that Work Stealing can easily be extended to make use of custom load balancers, that, for various classes of workloads (e.g. memory bound computations), can greatly boost the scheduler’s performance, while, at the same time, maintaining Work Stealing’s high performance for the general setting. Then, we present a provably efficient scheduler that mixes both receiver and sender-initiated poli- cies, and theoretically show that it successfully overcomes Work Stealing’s limitations for the execution of computations with few or irregular parallelism. Finally, we present a novel scheduling algorithm, whose expected runtime bounds are optimal within a con- stant factor, and that avoids most of the costs associated with memory fences, bounding the total expected overheads incurred by memory fences to O (P T∞), where T∞ is the critical-path length of a computation, and P is the number of processors. This contrasts with state-of-the-art Work Stealing algorithms where the total overheads incurred by these synchronization mechanisms can grow proportionally with the total amount of work. From this perspective, our proposal greatly improves the state-of-the-art Work Stealing algorithm. In fact, as we will prove, for several classes of computations, the over- heads incurred by our algorithm are exponentially smaller than the overheads incurred by state-of-the-art Work Stealing algorithms.

Item metadata

id	RCAP_f1dcb95a8d50a929a274d3e29bd19f71
oai_identifier_str	oai:run.unl.pt:10362/166873
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Scheduling computationsScheduling algorithmsRandomized algorithmsParallel computingDistributed computingDynamic load balancingDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaFor quite some time, the Work Stealing algorithm has been the de facto standard for scheduling multithreaded computations. To ensure scalability and achieve high perfor- mance, work is scattered through processors. In turn, each processor owns a concurrent work queue that uses to keep track of its assigned tasks. When a processor’s work queue becomes empty, it becomes a thief and starts targeting victims uniformly at random, from which it attempts stealing tasks. This strategy was proved to be efficient in both theory and practice, and is currently used in state-of-the-art Work Stealing algorithms. Nevertheless, purely receiver initiated load balancing schemes, such as Work Steal- ing’s, are known not to be suitable for scheduling computations with few or unbalanced parallelism. More, due to the concurrent nature of work queues, even local operations require memory fences that are extremely expensive on modern computer architectures. Consequently, even when a processor is busy, it may incur in costly overheads caused by local accesses to its work queue. Finally, as the scheduler’s load balancer relies on ran- dom steals, its performance when executing memory bound computations is very limited. Despite all efforts, no silver-bullet has been found, and, even worse, all these limitations still exist in state-of-the-art Work Stealing algorithms. In this thesis we make three major theoretical contributions, addressing each of the aforementioned limitations. First, we prove that Work Stealing can easily be extended to make use of custom load balancers, that, for various classes of workloads (e.g. memory bound computations), can greatly boost the scheduler’s performance, while, at the same time, maintaining Work Stealing’s high performance for the general setting. Then, we present a provably efficient scheduler that mixes both receiver and sender-initiated poli- cies, and theoretically show that it successfully overcomes Work Stealing’s limitations for the execution of computations with few or irregular parallelism. Finally, we present a novel scheduling algorithm, whose expected runtime bounds are optimal within a con- stant factor, and that avoids most of the costs associated with memory fences, bounding the total expected overheads incurred by memory fences to O (P T∞), where T∞ is the critical-path length of a computation, and P is the number of processors. This contrasts with state-of-the-art Work Stealing algorithms where the total overheads incurred by these synchronization mechanisms can grow proportionally with the total amount of work. From this perspective, our proposal greatly improves the state-of-the-art Work Stealing algorithm. In fact, as we will prove, for several classes of computations, the over- heads incurred by our algorithm are exponentially smaller than the overheads incurred by state-of-the-art Work Stealing algorithms.O algoritmo Work Stealing é considerado, já há vários anos, o standard no que toca à execução de computações paralelas. Para garantir escalabilidade e alcançar altos desempe- nhos, o trabalho é distribuido por processadores. Por sua vez, cada processador tem uma fila de trabalho concorrente, que utiliza para guardar as tarefas que lhe foram atribuídas. Quando a fila de trabalho de um processador fica vazia, este torna-se ladrão e começa a escolher vítimas, de forma uniformemente aleatória, das quais tenta roubar tarefas. Esta estratégia foi provada ser eficiente, tanto em teoria como na prática, e é actualmente utilizada nos algoritmos de Work Stealing do estado da arte. Contudo, estratégias de balanceamento de carga como a do Work Stealing em que apenas os recipientes tomam a iniciativa de balancear a carga são conhecidas por não serem adequadas para o escalonamento de computações cujo paralelismo é reduzido, ou mesmo desíquilibrado. Além disso, devido à natureza concorrente das filas de trabalho, até operações locais requerem o uso de barreiras de memória, cujos custos são extrema- mente elevados em arquiteturas de computadores modernas. Por conseguinte, mesmo quando um processador está ocupado, este pode, frequentemente, incorrer em overheads bastante significativos causados por simples acessos locais à sua própria fila de trabalho. Finalmente, como o balanceamento de carga do escalonador se baseia apenas em roubos aleatórios, o seu desempenho enquanto executa computações memory bound é bastante limitado. Apesar de todos os esforços, não foi ainda descoberta nenhuma solução que con- siga resolver estes problemas e, ainda pior, todas estas limitações existem nos algoritmos de Work Stealing do estado da arte. Nesta tese, fazemos três grandes contribuições teóricas, cada uma endereçando uma das limitações acima referidas. Primeiro, provamos que o Work Stealing pode ser facil- mente estendido para usar mecanismos personalizados de balanceamento de carga que, para inúmeras classes de computações, conseguem melhorar significativamente o desem- penho do escalonador e, ao mesmo tempo, continuar a garantir altos desempenhos para o caso geral. De seguida apresentamos um novo algoritmo de escalonamento que prova- mos ser eficiente e que utiliza, não só estratégias de roubo, mas também de distribuição de trabalho. Mostramos também, teoricamente, que esta estratégia de balanceamento de carga consegue ultrapassar com grande sucesso as limitações do Work Stealing, para computações cujo paralelismo é reduzido ou desíquilibrado. Por último, apresentamos um novo algoritmo de escalonamento para o qual o tempo esperado de execução de com- putações é óptimo segundo um factor constante, e que consegue ainda evitar a grande maioria dos overheads associados às barreiras de memória causadas por acessos locais dos processadores às suas próprias filas de trabalho. Provamos ainda que os overheads totais es- perados causados por estas barreiras são O (P T∞), onde T∞ corresponde ao comprimento do caminho-crítico de uma computação, e em que P denota o número de processadores. Estes resultados contrastam com o estado da arte de algoritmos de Work Stealing, em que os overheads causados por estas mesmas barreiras podem crescer proporcionalmente com a quantidade total de trabalho. Nesta perspectiva, a nossa proposta melhora substancial- mente o atual algoritmo de Work Stealing do estado da arte. Tal como vamos mostrar, para inúmeras classes de computações, os overheads incorridos pelo nosso algoritmo são expo- nencialmente menores, quando comparados com os overheads incorridos pelos algoritmos do estado da arte de Work Stealing.Paulino, HervéRUNRito, Guilherme Miguel Teixeira2024-05-02T15:22:58Z2016-112016-11-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/166873enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:20:59Zoai:run.unl.pt:10362/166873Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:51:46.614747Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Scheduling computations
title	Scheduling computations
spellingShingle	Scheduling computations Rito, Guilherme Miguel Teixeira Scheduling algorithms Randomized algorithms Parallel computing Distributed computing Dynamic load balancing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short	Scheduling computations
title_full	Scheduling computations
title_fullStr	Scheduling computations
title_full_unstemmed	Scheduling computations
title_sort	Scheduling computations
author	Rito, Guilherme Miguel Teixeira
author_facet	Rito, Guilherme Miguel Teixeira
author_role	author
dc.contributor.none.fl_str_mv	Paulino, Hervé RUN
dc.contributor.author.fl_str_mv	Rito, Guilherme Miguel Teixeira
dc.subject.por.fl_str_mv	Scheduling algorithms Randomized algorithms Parallel computing Distributed computing Dynamic load balancing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic	Scheduling algorithms Randomized algorithms Parallel computing Distributed computing Dynamic load balancing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description	For quite some time, the Work Stealing algorithm has been the de facto standard for scheduling multithreaded computations. To ensure scalability and achieve high perfor- mance, work is scattered through processors. In turn, each processor owns a concurrent work queue that uses to keep track of its assigned tasks. When a processor’s work queue becomes empty, it becomes a thief and starts targeting victims uniformly at random, from which it attempts stealing tasks. This strategy was proved to be efficient in both theory and practice, and is currently used in state-of-the-art Work Stealing algorithms. Nevertheless, purely receiver initiated load balancing schemes, such as Work Steal- ing’s, are known not to be suitable for scheduling computations with few or unbalanced parallelism. More, due to the concurrent nature of work queues, even local operations require memory fences that are extremely expensive on modern computer architectures. Consequently, even when a processor is busy, it may incur in costly overheads caused by local accesses to its work queue. Finally, as the scheduler’s load balancer relies on ran- dom steals, its performance when executing memory bound computations is very limited. Despite all efforts, no silver-bullet has been found, and, even worse, all these limitations still exist in state-of-the-art Work Stealing algorithms. In this thesis we make three major theoretical contributions, addressing each of the aforementioned limitations. First, we prove that Work Stealing can easily be extended to make use of custom load balancers, that, for various classes of workloads (e.g. memory bound computations), can greatly boost the scheduler’s performance, while, at the same time, maintaining Work Stealing’s high performance for the general setting. Then, we present a provably efficient scheduler that mixes both receiver and sender-initiated poli- cies, and theoretically show that it successfully overcomes Work Stealing’s limitations for the execution of computations with few or irregular parallelism. Finally, we present a novel scheduling algorithm, whose expected runtime bounds are optimal within a con- stant factor, and that avoids most of the costs associated with memory fences, bounding the total expected overheads incurred by memory fences to O (P T∞), where T∞ is the critical-path length of a computation, and P is the number of processors. This contrasts with state-of-the-art Work Stealing algorithms where the total overheads incurred by these synchronization mechanisms can grow proportionally with the total amount of work. From this perspective, our proposal greatly improves the state-of-the-art Work Stealing algorithm. In fact, as we will prove, for several classes of computations, the over- heads incurred by our algorithm are exponentially smaller than the overheads incurred by state-of-the-art Work Stealing algorithms.
publishDate	2016
dc.date.none.fl_str_mv	2016-11 2016-11-01T00:00:00Z 2024-05-02T15:22:58Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10362/166873
url	http://hdl.handle.net/10362/166873
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833597016847941632

Scheduling computations

Similar Items