On Systems and Algorithms for Distributed Machine Learning

Robert Nishihara

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2019-30

May 10, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-30.pdf

The advent of algorithms capable of leveraging vast quantities of data and computational resources has led to the proliferation of systems and tools aimed to facilitate the development and usage of these algorithms. Hardware trends, including the end of Moore's Law and the maturation of cloud computing, have placed a premium on the development of scalable algorithms designed for parallel architectures. The combination of these factors has made distributed computing an integral part of machine learning in practice.

This thesis examines the design of systems and algorithms to support machine learning in the distributed setting. The distributed computing landscape today consists of many domain-specific tools. We argue that these tools underestimate the generality of many modern machine learning applications and hence struggle to support them. We examine the requirements of a system capable of supporting modern machine learning workloads and present a general purpose distributed system architecture for doing so. In addition, we examine several examples of specific distributed learning algorithms. We explore the theoretical properties of these algorithms and see how they can leverage such a system.

Advisors: Michael Jordan

BibTeX citation:

@phdthesis{Nishihara:EECS-2019-30,
    Author= {Nishihara, Robert},
    Title= {On Systems and Algorithms for Distributed Machine Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2019},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-30.html},
    Number= {UCB/EECS-2019-30},
    Abstract= {The advent of algorithms capable of leveraging vast quantities of data and computational resources has led to the proliferation of systems and tools aimed to facilitate the development and usage of these algorithms. Hardware trends, including the end of Moore's Law and the maturation of cloud computing, have placed a premium on the development of scalable algorithms designed for parallel architectures. The combination of these factors has made distributed computing an integral part of machine learning in practice.

This thesis examines the design of systems and algorithms to support machine learning in the distributed setting. The distributed computing landscape today consists of many domain-specific tools. We argue that these tools underestimate the generality of many modern machine learning applications and hence struggle to support them. We examine the requirements of a system capable of supporting modern machine learning workloads and present a general purpose distributed system architecture for doing so. In addition, we examine several examples of specific distributed learning algorithms. We explore the theoretical properties of these algorithms and see how they can leverage such a system.},
}

EndNote citation:

%0 Thesis
%A Nishihara, Robert 
%T On Systems and Algorithms for Distributed Machine Learning
%I EECS Department, University of California, Berkeley
%D 2019
%8 May 10
%@ UCB/EECS-2019-30
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-30.html
%F Nishihara:EECS-2019-30