Yeshwanth Cherapanamjeri

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-259

December 1, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-259.pdf

The design of statistical estimators robust to outliers has been a mainstay of statistical research through the past six decades. These techniques are even more prescient in the contemporary landscape where large-scale machine learning systems are deployed in increasingly noisy and adaptive environments. In this thesis, we consider the task of building such an estimator for arguably the simplest possible statistical estimation problem -- that of mean estimation. There is surprisingly little understanding of the computational and statistical limits of estimation and the trade-offs incurred even for this relatively simple setting. We make progress on this problem along three complementary axes.

Our first contribution is a simple algorithmic framework for constructing robust estimators. Our framework allows for a significant speed-up over prior approaches for mean estimation while also allowing for easy extensibility to other statistical estimation tasks where it achieves state-of-the-art performance.

Secondly, we investigate the statistical boundaries of mean estimation where we demonstrate the necessary statistical degradation incurred in extremely heavy-tailed scenarios. While prior work showed that estimation could be performed as well as if one had access to Guassian data, we establish that this is no longer true when the data possesses heavier tails. We provide lower bounds which exhibit this degradation and an (efficient) algorithm matching them.

Lastly, we consider the stability of these estimators to natural transformations of the data. Inspired by the empirical mean, classical work constructed estimators equivariant to affine transformations. These works, however, lacked the strong quantitative performance of more recent approaches. We demonstrate that such trade-offs are in fact necessary by constructing novel lower bounds for affine-equivariant estimators. We then show that classical estimators are quantitatively deficient even in this restricted class and devise an estimator based on a novel notion of a high-dimensional median which matches the lower bound.

Advisors: Peter Bartlett


BibTeX citation:

@phdthesis{Cherapanamjeri:EECS-2023-259,
    Author= {Cherapanamjeri, Yeshwanth},
    Title= {Recent Developments in Robust Statistics},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-259.html},
    Number= {UCB/EECS-2023-259},
    Abstract= {The design of statistical estimators robust to outliers has been a mainstay of statistical research through the past six decades. These techniques are even more prescient in the contemporary landscape where large-scale machine learning systems are deployed in increasingly noisy and adaptive environments. In this thesis, we consider the task of building such an estimator for arguably the simplest possible statistical estimation problem -- that of mean estimation. There is surprisingly little understanding of the computational and statistical limits of estimation and the trade-offs incurred even for this relatively simple setting. We make progress on this problem along three complementary axes.

Our first contribution is a simple algorithmic framework for constructing robust estimators. Our framework allows for a significant speed-up over prior approaches for mean estimation while also allowing for easy extensibility to other statistical estimation tasks where it achieves state-of-the-art performance. 

Secondly, we investigate the statistical boundaries of mean estimation where we demonstrate the necessary statistical degradation incurred in extremely heavy-tailed scenarios. While prior work showed that estimation could be performed as well as if one had access to Guassian data, we establish that this is no longer true when the data possesses heavier tails. We provide lower bounds which exhibit this degradation and an (efficient) algorithm matching them. 

Lastly, we consider the stability of these estimators to natural transformations of the data. Inspired by the empirical mean, classical work constructed estimators equivariant to affine transformations. These works, however, lacked the strong quantitative performance of more recent approaches. We demonstrate that such trade-offs are in fact necessary by constructing novel lower bounds for affine-equivariant estimators. We then show that classical estimators are quantitatively deficient even in this restricted class and devise an estimator based on a novel notion of a high-dimensional median which matches the lower bound.},
}

EndNote citation:

%0 Thesis
%A Cherapanamjeri, Yeshwanth 
%T Recent Developments in Robust Statistics
%I EECS Department, University of California, Berkeley
%D 2023
%8 December 1
%@ UCB/EECS-2023-259
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-259.html
%F Cherapanamjeri:EECS-2023-259