Scaling Up Deep Learning on Clusters

Aleks Kamko

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2017-54
May 11, 2017

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-54.pdf

The Scaling Up project aims to develop a state-of-the-art machine learning framework that efficiently leverages the power of a cluster of machines. As data becomes increasingly more plentiful, methods for efficiently leveraging computing power to crunch these numbers are becoming more critical. Typical industry datasets are on the order of ≥1 Terabyte and growing, making them infeasible to process using a single machine. As a result, developing algorithms and frameworks for training statistical models in a distributed, cluster-accelerated setting is a hot area of research today.

Professor John Canny, our capstone advisor, has developed the BIDData Suite, a machine learning toolkit that expertly utilizes GPUs to achieve record-breaking "roofline" performance on a single machine. Our capstone focuses on extending BIDData's statistical models with the ability to train effectively in parallel on a cluster.

Our team has succeeded in developing multiple cluster-enabling modules within BIDData's codebase, including (1) an inter-machine communication framework, covered in Jiaqi Xie's technical report, (2) a network throughput monitor, covered in Quanlai Li's technical report, and (3) several distributed variants of practical machine learning models, covered in depth in Chapter 1 of this report.

Chapter 2 focuses on the issues that arise as a consequence of the growing trends of using machine learning to analyze massive datasets in industry, and how our project aims to alleviate some of these issues. Chapter 2 also provides an analysis of the market strategy for our industry partner, OpenChai, who is trying to bring the benefits of machine learning to lagging enterprise like healthcare and banking.

Advisor: John F. Canny


BibTeX citation:

@mastersthesis{Kamko:EECS-2017-54,
    Author = {Kamko, Aleks},
    Title = {Scaling Up Deep Learning on Clusters},
    School = {EECS Department, University of California, Berkeley},
    Year = {2017},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-54.html},
    Number = {UCB/EECS-2017-54},
    Abstract = {The Scaling Up project aims to develop a state-of-the-art machine learning framework that efficiently leverages the power of a cluster of machines. As data becomes increasingly more plentiful, methods for efficiently leveraging computing power to crunch these numbers are becoming more critical. Typical industry datasets are on the order of  ≥1 Terabyte and growing, making them infeasible to process using a single machine. As a result, developing algorithms and frameworks for training statistical models in a distributed, cluster-accelerated setting is a hot area of research today.

Professor John Canny, our capstone advisor, has developed the BIDData Suite, a machine learning toolkit that expertly utilizes GPUs to achieve record-breaking "roofline" performance on a single machine. Our capstone focuses on extending BIDData's statistical models with the ability to train effectively in parallel on a cluster.

Our team has succeeded in developing multiple cluster-enabling modules within BIDData's codebase, including (1) an inter-machine communication framework, covered in Jiaqi Xie's technical report, (2) a network throughput monitor, covered in Quanlai Li's technical report, and (3) several distributed variants of practical machine learning models, covered in depth in Chapter 1 of this report.

Chapter 2 focuses on the issues that arise as a consequence of the growing trends of using machine learning to analyze massive datasets in industry, and how our project aims to alleviate some of these issues. Chapter 2 also provides an analysis of the market strategy for our industry partner, OpenChai, who is trying to bring the benefits of machine learning to lagging enterprise like healthcare and banking.}
}

EndNote citation:

%0 Thesis
%A Kamko, Aleks
%T Scaling Up Deep Learning on Clusters
%I EECS Department, University of California, Berkeley
%D 2017
%8 May 11
%@ UCB/EECS-2017-54
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-54.html
%F Kamko:EECS-2017-54