Patrick Wendell

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2013-79

May 16, 2013

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-79.pdf

Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. However, scheduling highly parallel jobs that com- plete in hundreds of milliseconds poses a major challenge for cluster schedulers, which will need to place millions of tasks per second on appropriate nodes while offering millisecond-level la- tency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a real cluster and demon- strate that Sparrow performs within 14% of an ideal scheduler.

Advisors: Ion Stoica


BibTeX citation:

@mastersthesis{Wendell:EECS-2013-79,
    Author= {Wendell, Patrick},
    Title= {Scalable Scheduling for Sub-Second Parallel Jobs},
    School= {EECS Department, University of California, Berkeley},
    Year= {2013},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-79.html},
    Number= {UCB/EECS-2013-79},
    Abstract= {Large-scale data analytics frameworks are shifting towards shorter task durations and larger
degrees of parallelism to provide low latency. However, scheduling highly parallel jobs that com-
plete in hundreds of milliseconds poses a major challenge for cluster schedulers, which will need
to place millions of tasks per second on appropriate nodes while offering millisecond-level la-
tency and high availability. We demonstrate that a decentralized, randomized sampling approach
provides near-optimal performance while avoiding the throughput and availability limitations of a
centralized design. We implement and deploy our scheduler, Sparrow, on a real cluster and demon-
strate that Sparrow performs within 14% of an ideal scheduler.},
}

EndNote citation:

%0 Thesis
%A Wendell, Patrick 
%T Scalable Scheduling for Sub-Second Parallel Jobs
%I EECS Department, University of California, Berkeley
%D 2013
%8 May 16
%@ UCB/EECS-2013-79
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-79.html
%F Wendell:EECS-2013-79