Communication-Efficient Distributed Stochastic Gradient Descent with Butterfly Mixing
Huasha Zhao and John F. Canny
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2012-96
May 11, 2012
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-96.pdf
Stochastic gradient descent is a widely used method to find locally-optimal models in machine learning and data mining. However, it is naturally a sequential algorithm, and parallelization involves severe compromises because the cost of synchronizing across a cluster is much larger than the time required to compute an optimal-sized gradient step. Here we explore butterfly mixing, where gradient steps are interleaved with the k stages of a butterfly network on 2^k nodes. Udp based butterfly mix steps should be extremely fast and failure-tolerant, and convergence is almost as fast as a full mix (AllReduce) on every step.
Advisors: John F. Canny
BibTeX citation:
@mastersthesis{Zhao:EECS-2012-96, Author= {Zhao, Huasha and Canny, John F.}, Title= {Communication-Efficient Distributed Stochastic Gradient Descent with Butterfly Mixing}, School= {EECS Department, University of California, Berkeley}, Year= {2012}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-96.html}, Number= {UCB/EECS-2012-96}, Abstract= {Stochastic gradient descent is a widely used method to find locally-optimal models in machine learning and data mining. However, it is naturally a sequential algorithm, and parallelization involves severe compromises because the cost of synchronizing across a cluster is much larger than the time required to compute an optimal-sized gradient step. Here we explore butterfly mixing, where gradient steps are interleaved with the k stages of a butterfly network on 2^k nodes. Udp based butterfly mix steps should be extremely fast and failure-tolerant, and convergence is almost as fast as a full mix (AllReduce) on every step.}, }
EndNote citation:
%0 Thesis %A Zhao, Huasha %A Canny, John F. %T Communication-Efficient Distributed Stochastic Gradient Descent with Butterfly Mixing %I EECS Department, University of California, Berkeley %D 2012 %8 May 11 %@ UCB/EECS-2012-96 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-96.html %F Zhao:EECS-2012-96