Scheduling computations
Main Author: | |
---|---|
Publication Date: | 2016 |
Format: | Master thesis |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10362/166873 |
Summary: | For quite some time, the Work Stealing algorithm has been the de facto standard for scheduling multithreaded computations. To ensure scalability and achieve high perfor- mance, work is scattered through processors. In turn, each processor owns a concurrent work queue that uses to keep track of its assigned tasks. When a processor’s work queue becomes empty, it becomes a thief and starts targeting victims uniformly at random, from which it attempts stealing tasks. This strategy was proved to be efficient in both theory and practice, and is currently used in state-of-the-art Work Stealing algorithms. Nevertheless, purely receiver initiated load balancing schemes, such as Work Steal- ing’s, are known not to be suitable for scheduling computations with few or unbalanced parallelism. More, due to the concurrent nature of work queues, even local operations require memory fences that are extremely expensive on modern computer architectures. Consequently, even when a processor is busy, it may incur in costly overheads caused by local accesses to its work queue. Finally, as the scheduler’s load balancer relies on ran- dom steals, its performance when executing memory bound computations is very limited. Despite all efforts, no silver-bullet has been found, and, even worse, all these limitations still exist in state-of-the-art Work Stealing algorithms. In this thesis we make three major theoretical contributions, addressing each of the aforementioned limitations. First, we prove that Work Stealing can easily be extended to make use of custom load balancers, that, for various classes of workloads (e.g. memory bound computations), can greatly boost the scheduler’s performance, while, at the same time, maintaining Work Stealing’s high performance for the general setting. Then, we present a provably efficient scheduler that mixes both receiver and sender-initiated poli- cies, and theoretically show that it successfully overcomes Work Stealing’s limitations for the execution of computations with few or irregular parallelism. Finally, we present a novel scheduling algorithm, whose expected runtime bounds are optimal within a con- stant factor, and that avoids most of the costs associated with memory fences, bounding the total expected overheads incurred by memory fences to O (P T∞), where T∞ is the critical-path length of a computation, and P is the number of processors. This contrasts with state-of-the-art Work Stealing algorithms where the total overheads incurred by these synchronization mechanisms can grow proportionally with the total amount of work. From this perspective, our proposal greatly improves the state-of-the-art Work Stealing algorithm. In fact, as we will prove, for several classes of computations, the over- heads incurred by our algorithm are exponentially smaller than the overheads incurred by state-of-the-art Work Stealing algorithms. |
id |
RCAP_f1dcb95a8d50a929a274d3e29bd19f71 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/166873 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Scheduling computationsScheduling algorithmsRandomized algorithmsParallel computingDistributed computingDynamic load balancingDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaFor quite some time, the Work Stealing algorithm has been the de facto standard for scheduling multithreaded computations. To ensure scalability and achieve high perfor- mance, work is scattered through processors. In turn, each processor owns a concurrent work queue that uses to keep track of its assigned tasks. When a processor’s work queue becomes empty, it becomes a thief and starts targeting victims uniformly at random, from which it attempts stealing tasks. This strategy was proved to be efficient in both theory and practice, and is currently used in state-of-the-art Work Stealing algorithms. Nevertheless, purely receiver initiated load balancing schemes, such as Work Steal- ing’s, are known not to be suitable for scheduling computations with few or unbalanced parallelism. More, due to the concurrent nature of work queues, even local operations require memory fences that are extremely expensive on modern computer architectures. Consequently, even when a processor is busy, it may incur in costly overheads caused by local accesses to its work queue. Finally, as the scheduler’s load balancer relies on ran- dom steals, its performance when executing memory bound computations is very limited. Despite all efforts, no silver-bullet has been found, and, even worse, all these limitations still exist in state-of-the-art Work Stealing algorithms. In this thesis we make three major theoretical contributions, addressing each of the aforementioned limitations. First, we prove that Work Stealing can easily be extended to make use of custom load balancers, that, for various classes of workloads (e.g. memory bound computations), can greatly boost the scheduler’s performance, while, at the same time, maintaining Work Stealing’s high performance for the general setting. Then, we present a provably efficient scheduler that mixes both receiver and sender-initiated poli- cies, and theoretically show that it successfully overcomes Work Stealing’s limitations for the execution of computations with few or irregular parallelism. Finally, we present a novel scheduling algorithm, whose expected runtime bounds are optimal within a con- stant factor, and that avoids most of the costs associated with memory fences, bounding the total expected overheads incurred by memory fences to O (P T∞), where T∞ is the critical-path length of a computation, and P is the number of processors. This contrasts with state-of-the-art Work Stealing algorithms where the total overheads incurred by these synchronization mechanisms can grow proportionally with the total amount of work. From this perspective, our proposal greatly improves the state-of-the-art Work Stealing algorithm. In fact, as we will prove, for several classes of computations, the over- heads incurred by our algorithm are exponentially smaller than the overheads incurred by state-of-the-art Work Stealing algorithms.O algoritmo Work Stealing é considerado, já há vários anos, o standard no que toca à execução de computações paralelas. Para garantir escalabilidade e alcançar altos desempe- nhos, o trabalho é distribuido por processadores. Por sua vez, cada processador tem uma fila de trabalho concorrente, que utiliza para guardar as tarefas que lhe foram atribuídas. Quando a fila de trabalho de um processador fica vazia, este torna-se ladrão e começa a escolher vítimas, de forma uniformemente aleatória, das quais tenta roubar tarefas. Esta estratégia foi provada ser eficiente, tanto em teoria como na prática, e é actualmente utilizada nos algoritmos de Work Stealing do estado da arte. Contudo, estratégias de balanceamento de carga como a do Work Stealing em que apenas os recipientes tomam a iniciativa de balancear a carga são conhecidas por não serem adequadas para o escalonamento de computações cujo paralelismo é reduzido, ou mesmo desíquilibrado. Além disso, devido à natureza concorrente das filas de trabalho, até operações locais requerem o uso de barreiras de memória, cujos custos são extrema- mente elevados em arquiteturas de computadores modernas. Por conseguinte, mesmo quando um processador está ocupado, este pode, frequentemente, incorrer em overheads bastante significativos causados por simples acessos locais à sua própria fila de trabalho. Finalmente, como o balanceamento de carga do escalonador se baseia apenas em roubos aleatórios, o seu desempenho enquanto executa computações memory bound é bastante limitado. Apesar de todos os esforços, não foi ainda descoberta nenhuma solução que con- siga resolver estes problemas e, ainda pior, todas estas limitações existem nos algoritmos de Work Stealing do estado da arte. Nesta tese, fazemos três grandes contribuições teóricas, cada uma endereçando uma das limitações acima referidas. Primeiro, provamos que o Work Stealing pode ser facil- mente estendido para usar mecanismos personalizados de balanceamento de carga que, para inúmeras classes de computações, conseguem melhorar significativamente o desem- penho do escalonador e, ao mesmo tempo, continuar a garantir altos desempenhos para o caso geral. De seguida apresentamos um novo algoritmo de escalonamento que prova- mos ser eficiente e que utiliza, não só estratégias de roubo, mas também de distribuição de trabalho. Mostramos também, teoricamente, que esta estratégia de balanceamento de carga consegue ultrapassar com grande sucesso as limitações do Work Stealing, para computações cujo paralelismo é reduzido ou desíquilibrado. Por último, apresentamos um novo algoritmo de escalonamento para o qual o tempo esperado de execução de com- putações é óptimo segundo um factor constante, e que consegue ainda evitar a grande maioria dos overheads associados às barreiras de memória causadas por acessos locais dos processadores às suas próprias filas de trabalho. Provamos ainda que os overheads totais es- perados causados por estas barreiras são O (P T∞), onde T∞ corresponde ao comprimento do caminho-crítico de uma computação, e em que P denota o número de processadores. Estes resultados contrastam com o estado da arte de algoritmos de Work Stealing, em que os overheads causados por estas mesmas barreiras podem crescer proporcionalmente com a quantidade total de trabalho. Nesta perspectiva, a nossa proposta melhora substancial- mente o atual algoritmo de Work Stealing do estado da arte. Tal como vamos mostrar, para inúmeras classes de computações, os overheads incorridos pelo nosso algoritmo são expo- nencialmente menores, quando comparados com os overheads incorridos pelos algoritmos do estado da arte de Work Stealing.Paulino, HervéRUNRito, Guilherme Miguel Teixeira2024-05-02T15:22:58Z2016-112016-11-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/166873enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:20:59Zoai:run.unl.pt:10362/166873Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:51:46.614747Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Scheduling computations |
title |
Scheduling computations |
spellingShingle |
Scheduling computations Rito, Guilherme Miguel Teixeira Scheduling algorithms Randomized algorithms Parallel computing Distributed computing Dynamic load balancing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
title_short |
Scheduling computations |
title_full |
Scheduling computations |
title_fullStr |
Scheduling computations |
title_full_unstemmed |
Scheduling computations |
title_sort |
Scheduling computations |
author |
Rito, Guilherme Miguel Teixeira |
author_facet |
Rito, Guilherme Miguel Teixeira |
author_role |
author |
dc.contributor.none.fl_str_mv |
Paulino, Hervé RUN |
dc.contributor.author.fl_str_mv |
Rito, Guilherme Miguel Teixeira |
dc.subject.por.fl_str_mv |
Scheduling algorithms Randomized algorithms Parallel computing Distributed computing Dynamic load balancing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
topic |
Scheduling algorithms Randomized algorithms Parallel computing Distributed computing Dynamic load balancing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
description |
For quite some time, the Work Stealing algorithm has been the de facto standard for scheduling multithreaded computations. To ensure scalability and achieve high perfor- mance, work is scattered through processors. In turn, each processor owns a concurrent work queue that uses to keep track of its assigned tasks. When a processor’s work queue becomes empty, it becomes a thief and starts targeting victims uniformly at random, from which it attempts stealing tasks. This strategy was proved to be efficient in both theory and practice, and is currently used in state-of-the-art Work Stealing algorithms. Nevertheless, purely receiver initiated load balancing schemes, such as Work Steal- ing’s, are known not to be suitable for scheduling computations with few or unbalanced parallelism. More, due to the concurrent nature of work queues, even local operations require memory fences that are extremely expensive on modern computer architectures. Consequently, even when a processor is busy, it may incur in costly overheads caused by local accesses to its work queue. Finally, as the scheduler’s load balancer relies on ran- dom steals, its performance when executing memory bound computations is very limited. Despite all efforts, no silver-bullet has been found, and, even worse, all these limitations still exist in state-of-the-art Work Stealing algorithms. In this thesis we make three major theoretical contributions, addressing each of the aforementioned limitations. First, we prove that Work Stealing can easily be extended to make use of custom load balancers, that, for various classes of workloads (e.g. memory bound computations), can greatly boost the scheduler’s performance, while, at the same time, maintaining Work Stealing’s high performance for the general setting. Then, we present a provably efficient scheduler that mixes both receiver and sender-initiated poli- cies, and theoretically show that it successfully overcomes Work Stealing’s limitations for the execution of computations with few or irregular parallelism. Finally, we present a novel scheduling algorithm, whose expected runtime bounds are optimal within a con- stant factor, and that avoids most of the costs associated with memory fences, bounding the total expected overheads incurred by memory fences to O (P T∞), where T∞ is the critical-path length of a computation, and P is the number of processors. This contrasts with state-of-the-art Work Stealing algorithms where the total overheads incurred by these synchronization mechanisms can grow proportionally with the total amount of work. From this perspective, our proposal greatly improves the state-of-the-art Work Stealing algorithm. In fact, as we will prove, for several classes of computations, the over- heads incurred by our algorithm are exponentially smaller than the overheads incurred by state-of-the-art Work Stealing algorithms. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-11 2016-11-01T00:00:00Z 2024-05-02T15:22:58Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/166873 |
url |
http://hdl.handle.net/10362/166873 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833597016847941632 |