CA-SVM: Communication-Avoiding Parallel Support Vector Machines on Distributed Systems

Yang You, James Demmel, Kenneth Czechowski, Le Song and Richard Vuduc

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2015-9
February 27, 2015

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-9.pdf

We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficiency of a state-of-the-art implementation scaled as W = Omega(P^3), where W is the problem size and P the number of processors; this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Omega(P^2). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the isoefficiency to nearly W = Omega(P). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 - 16x (7x on average) over Dis-SMO, and a 95% weak-scaling efficiency on six realworld datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at [1].


BibTeX citation:

@techreport{You:EECS-2015-9,
    Author = {You, Yang and Demmel, James and Czechowski, Kenneth and Song, Le and Vuduc, Richard},
    Title = {CA-SVM: Communication-Avoiding Parallel Support Vector Machines on Distributed Systems},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2015},
    Month = {Feb},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-9.html},
    Number = {UCB/EECS-2015-9},
    Abstract = {We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficiency of a state-of-the-art implementation scaled as W = Omega(P^3), where W is the problem size and P the number of processors; this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Omega(P^2).
This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the isoefficiency to nearly W = Omega(P). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 - 16x (7x on average) over Dis-SMO, and a 95% weak-scaling efficiency on six realworld datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at [1].}
}

EndNote citation:

%0 Report
%A You, Yang
%A Demmel, James
%A Czechowski, Kenneth
%A Song, Le
%A Vuduc, Richard
%T CA-SVM: Communication-Avoiding Parallel Support Vector Machines on Distributed Systems
%I EECS Department, University of California, Berkeley
%D 2015
%8 February 27
%@ UCB/EECS-2015-9
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-9.html
%F You:EECS-2015-9