Matei Zaharia and Dhruba Borthakur and Joydeep Sen Sarma and Khaled Elmeleegy and Scott Shenker and Ion Stoica

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2009-55

April 30, 2009

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.pdf

Sharing a MapReduce cluster between users is attractive because it enables statistical multiplexing (lowering costs) and allows users to share a common large data set. However, we find that traditional scheduling algorithms can perform very poorly in MapReduce due to two aspects of the MapReduce setting: the need for data locality (running computation where the data is) and the dependence between map and reduce tasks. We illustrate these problems through our experience designing a fair scheduler for MapReduce at Facebook, which runs a 600-node multi-user data warehouse on Hadoop. We developed two simple techniques, delay scheduling and copy-compute splitting, which improve throughput and response times by factors of 2 to 10. Although we focus on multi-user workloads, our techniques can also raise throughput in a single-user, FIFO workload by a factor of 2.


BibTeX citation:

@techreport{Zaharia:EECS-2009-55,
    Author= {Zaharia, Matei and Borthakur, Dhruba and Sen Sarma, Joydeep and Elmeleegy, Khaled and Shenker, Scott and Stoica, Ion},
    Title= {Job Scheduling for Multi-User MapReduce Clusters},
    Year= {2009},
    Month= {Apr},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.html},
    Number= {UCB/EECS-2009-55},
    Abstract= {Sharing a MapReduce cluster between users is attractive because it enables statistical multiplexing (lowering costs) and allows users to share a common large data set. However, we find that traditional scheduling algorithms can perform very poorly in MapReduce due to two aspects of the MapReduce setting: the need for data locality (running computation where the data is) and the dependence between map and reduce tasks. We illustrate these problems through our experience designing a fair scheduler for MapReduce at Facebook, which runs a 600-node multi-user data warehouse on Hadoop. We developed two simple techniques, delay scheduling and copy-compute splitting, which improve throughput and response times by factors of 2 to 10. Although we focus on multi-user workloads, our techniques can also raise throughput in a single-user, FIFO workload by a factor of 2.},
}

EndNote citation:

%0 Report
%A Zaharia, Matei 
%A Borthakur, Dhruba 
%A Sen Sarma, Joydeep 
%A Elmeleegy, Khaled 
%A Shenker, Scott 
%A Stoica, Ion 
%T Job Scheduling for Multi-User MapReduce Clusters
%I EECS Department, University of California, Berkeley
%D 2009
%8 April 30
%@ UCB/EECS-2009-55
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.html
%F Zaharia:EECS-2009-55