CA-SVM: Communication-Avoiding Parallel Support Vector Machines on Distributed Systems

Yang You and James Demmel and Kenneth Czechowski and Le Song and Richard Vuduc

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2015-9

February 27, 2015

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-9.pdf

We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficiency of a state-of-the-art implementation scaled as W = Omega(P^3), where W is the problem size and P the number of processors; this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Omega(P^2). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the isoefficiency to nearly W = Omega(P). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 - 16x (7x on average) over Dis-SMO, and a 95% weak-scaling efficiency on six realworld datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at [1].

BibTeX citation:

@techreport{You:EECS-2015-9,
    Author= {You, Yang and Demmel, James and Czechowski, Kenneth and Song, Le and Vuduc, Richard},
    Title= {CA-SVM: Communication-Avoiding Parallel Support Vector Machines on Distributed Systems},
    Year= {2015},
    Month= {Feb},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-9.html},
    Number= {UCB/EECS-2015-9},
    Abstract= {We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficiency of a state-of-the-art implementation scaled as W = Omega(P^3), where W is the problem size and P the number of processors; this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Omega(P^2).
This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the isoefficiency to nearly W = Omega(P). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 - 16x (7x on average) over Dis-SMO, and a 95% weak-scaling efficiency on six realworld datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at [1].},
}

EndNote citation:

%0 Report
%A You, Yang 
%A Demmel, James 
%A Czechowski, Kenneth 
%A Song, Le 
%A Vuduc, Richard 
%T CA-SVM: Communication-Avoiding Parallel Support Vector Machines on Distributed Systems
%I EECS Department, University of California, Berkeley
%D 2015
%8 February 27
%@ UCB/EECS-2015-9
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-9.html
%F You:EECS-2015-9