Statistics meets Optimization: Computational guarantees for statistical learning algorithms

Fanny Yang

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2018-126
August 21, 2018

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-126.pdf

Modern technological advances have prompted massive scale data collection in many modern fields such as artificial intelligence, and traditional sciences alike. This has led to an increasing need for scalable machine learning algorithms and statistical methods to draw conclusions about the world. In all data-driven procedures, the data scientist faces the following fundamental questions: How should I design the learning algorithm and how long should I run it? Which samples should I collect for training and how many are sufficient to generalize conclusions to unseen data? These questions relate to statistical and computational properties of both the data and the algorithm. This thesis explores their role in the areas of non-convex optimization, non-parametric estimation, active learning and multiple testing.

In the first part of this thesis, we provide insights of different flavor concerning the interplay between statistical and computational properties of first-order type methods on common estimation procedures. The expectation-maximization (EM) algorithm estimates parameters of a latent variable model by running a first-order type method on a non-convex landscape. We identify and characterize a general class of Hidden Markov Models for which linear convergence of EM to a statistically optimal point is provable for a large initialization radius. For non-parametric estimation problems, functional gradient descent type (also called boosting) algorithms are used to estimate the best fit in infinite dimensional function spaces. We develop a new proof technique showing that early stopping the algorithm instead may also yield an optimal estimator without explicit regularization. In fact, the same key quantities (localized complexities) are underlying both traditional penalty-based and algorithmic regularization.

In the second part of the thesis, we explore how data collected adaptively with a constantly updated estimation can lead to signifcant reduction in sample complexity for multiple hypothesis testing problems. In particular, we show how adaptive strategies can be used to simultaneously control the false discovery rate over multiple tests and return the best alternative (among many) for each test with optimal sample complexity in an online manner.

Advisor: Martin Wainwright


BibTeX citation:

@phdthesis{Yang:EECS-2018-126,
    Author = {Yang, Fanny},
    Title = {Statistics meets Optimization: Computational guarantees for statistical learning algorithms},
    School = {EECS Department, University of California, Berkeley},
    Year = {2018},
    Month = {Aug},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-126.html},
    Number = {UCB/EECS-2018-126},
    Abstract = {Modern technological advances have prompted massive scale data collection in many modern fields such as artificial intelligence, and traditional sciences alike. This has led to an increasing need for scalable machine learning algorithms and statistical methods to draw conclusions about the world. In all data-driven procedures, the data scientist faces the following fundamental questions: How should I design the learning algorithm and how long should I run it? Which samples should I collect for training and how many are sufficient to generalize conclusions to unseen data? These questions relate to statistical and computational
properties of both the data and the algorithm. This thesis explores their role in the areas of non-convex optimization, non-parametric estimation, active learning and multiple testing.

In the first part of this thesis, we provide insights of different flavor concerning the interplay between statistical and computational properties of first-order type methods on common estimation procedures. The expectation-maximization (EM) algorithm estimates parameters of a latent variable model by running a first-order type method on a non-convex landscape. We identify and characterize a general class of Hidden Markov Models for which linear convergence of EM to a statistically optimal point is provable for a large initialization radius. For non-parametric estimation problems, functional gradient descent type (also called boosting) algorithms are used to estimate the best fit in infinite dimensional function spaces. We develop a new proof technique showing that early stopping the algorithm instead may also yield an optimal estimator without explicit regularization. In fact, the same key quantities (localized complexities) are underlying both traditional penalty-based and algorithmic regularization.

In the second part of the thesis, we explore how data collected adaptively with a constantly updated estimation can lead to signifcant reduction in sample complexity for multiple hypothesis testing problems. In particular, we show how adaptive strategies can be used to simultaneously control the false discovery rate over multiple tests and return the best alternative (among many) for each test with optimal sample complexity in an online manner.}
}

EndNote citation:

%0 Thesis
%A Yang, Fanny
%T Statistics meets Optimization: Computational guarantees for statistical learning algorithms
%I EECS Department, University of California, Berkeley
%D 2018
%8 August 21
%@ UCB/EECS-2018-126
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-126.html
%F Yang:EECS-2018-126