Towards Practical Serverless Analytics

Qifan Pu

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2019-105
June 25, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-105.pdf

Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. Even though many distributed computation frameworks have moved into the cloud, many users are still left to struggle with complex cluster management and configuration tools there. In this thesis, we argue that cloud stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. We first build a prototype system, PyWren, which runs on existing serverless function services, and show that this model is general enough to implement a number of distributed computing models, such as BSP. We then identify two main challenges to support truly practical and general analytics on a serverless platform. The first challenge is to facilitate communication-intensive operations, such as shuffle in the serverless setting. The second challenge is to provide an elastic cloud memory. In this thesis, we made progress on both challenges. For the first, we develop a system called Locus, that can automate shuffle operations by judiciously provisioning hybrid intermediate storage. For the second, we present an algorithm, FairRide, that achieves near-optimal memory cache efficiency in a multi-tenant setting.

Advisor: Ion Stoica


BibTeX citation:

@phdthesis{Pu:EECS-2019-105,
    Author = {Pu, Qifan},
    Title = {Towards Practical Serverless Analytics},
    School = {EECS Department, University of California, Berkeley},
    Year = {2019},
    Month = {Jun},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-105.html},
    Number = {UCB/EECS-2019-105},
    Abstract = {Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. Even though many distributed computation frameworks have moved into the cloud, many users are still left to struggle with complex cluster management and configuration tools there. In this thesis, we argue that cloud stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. We first build a prototype system, PyWren, which runs on existing serverless function services, and show that this model is general enough to implement a number of distributed computing models, such as BSP. We then identify two main challenges to support truly practical and general analytics on a serverless platform. The first challenge is to facilitate communication-intensive operations, such as shuffle in the serverless setting. The second challenge is to provide an elastic cloud memory. In this thesis, we made progress on both challenges. For the first, we develop a system called Locus, that can automate shuffle operations by judiciously provisioning hybrid intermediate storage. For the second, we present an algorithm, FairRide, that achieves near-optimal memory cache efficiency in a multi-tenant setting.}
}

EndNote citation:

%0 Thesis
%A Pu, Qifan
%T Towards Practical Serverless Analytics
%I EECS Department, University of California, Berkeley
%D 2019
%8 June 25
%@ UCB/EECS-2019-105
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-105.html
%F Pu:EECS-2019-105