Object Management in a Distributed Futures System

Edward Oakes

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-119

May 29, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-119.pdf

In recent years, there has been an increasing demand for distributed data processing across a wide range of application domains. In addition to the previous generation of large scale data processing, there are also new emerging applications that are centered around novel artificial intelligence and ma- chine learning techniques. These applications require a much more flexible programming interface than the traditional static execution graph offered by bulk synchronous parallel systems. In response, a number of domain-specific distributed systems have been built to address the needs of each new type of application. Ray has the promise to act as a unified execution engine for these applications, enabling high-performance distributed execution with a simple but flexible futures-based programming model. However, the system falls short of these promises due to two key shortcomings: application-agnostic least recently used (LRU) eviction for shared memory objects and high over- head for small objects. This work proposes a novel object management architecture for distributed futures that enables exact reference counting for shared-memory objects and reduces the overhead for tasks that depend on or produce small objects to that of nearly a single remote procedure call.

Advisors: Scott Shenker

BibTeX citation:

@mastersthesis{Oakes:EECS-2020-119,
    Author= {Oakes, Edward},
    Title= {Object Management in a Distributed Futures System},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-119.html},
    Number= {UCB/EECS-2020-119},
    Abstract= {In recent years, there has been an increasing demand for distributed data processing across a wide range of application domains. In addition to the previous generation of large scale data processing, there are also new emerging applications that are centered around novel artificial intelligence and ma- chine learning techniques. These applications require a much more flexible programming interface than the traditional static execution graph offered by bulk synchronous parallel systems. In response, a number of domain-specific distributed systems have been built to address the needs of each new type of application. Ray has the promise to act as a unified execution engine for these applications, enabling high-performance distributed execution with a simple but flexible futures-based programming model. However, the system falls short of these promises due to two key shortcomings: application-agnostic least recently used (LRU) eviction for shared memory objects and high over- head for small objects. This work proposes a novel object management architecture for distributed futures that enables exact reference counting for shared-memory objects and reduces the overhead for tasks that depend on or produce small objects to that of nearly a single remote procedure call.},
}

EndNote citation:

%0 Thesis
%A Oakes, Edward 
%T Object Management in a Distributed Futures System
%I EECS Department, University of California, Berkeley
%D 2020
%8 May 29
%@ UCB/EECS-2020-119
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-119.html
%F Oakes:EECS-2020-119