Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

John Duchi, Elad Hazan and Yoram Singer

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2010-24
March 3, 2010

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-24.pdf

We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods significantly outperform state-of-the-art, yet non-adaptive, subgradient algorithms.


BibTeX citation:

@techreport{Duchi:EECS-2010-24,
    Author = {Duchi, John and Hazan, Elad and Singer, Yoram},
    Title = {Adaptive Subgradient Methods for Online Learning and Stochastic Optimization},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2010},
    Month = {Mar},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-24.html},
    Number = {UCB/EECS-2010-24},
    Abstract = {We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods significantly outperform state-of-the-art, yet non-adaptive, subgradient algorithms.}
}

EndNote citation:

%0 Report
%A Duchi, John
%A Hazan, Elad
%A Singer, Yoram
%T Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
%I EECS Department, University of California, Berkeley
%D 2010
%8 March 3
%@ UCB/EECS-2010-24
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-24.html
%F Duchi:EECS-2010-24