Algorithms for Robust Linear Models against Strong Adversarial Corruptions

Yimeng Wang

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-97

May 13, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-97.pdf

In real-world data science and machine learning, data are inevitably imperfect. Data contamination comes in many sources. It may come from human errors which can be avoided with more caution. However, it may also come from sources such as systematic measurement errors and adversarial data poisoning that are hard to avoid and even detect. Consequently, there is a need for methods that can perform certain tasks in statistics despite this difficulty. Formally speaking, we want to design efficient algorithms that can provide provable guarantees for learning problems under certain models of contamination. In the article, we examine some important techniques in the recent development of efficient algo- rithms for robust statistics, namely filtering-based methods and sum-of-squares techniques. Specifi- cally, we will focus on the problem of learning linear models (including linear regression, generalized linear models etc.) under the strong contamination model. We will fully present and analyze the con- ditions and consequences of SEVER [DKK+19] and the sum-of-squares-based algorithm for robust linear regression in [KKM20]. SEVER is meta-algorithm that takes in a well-conditioned base learner and output a outlier-robust version of the base learners. The [KKM20] robust linear regression al- gorithm is an elegant and simple application of sum-of-squares techniques for robust regressions in general including l1, l2 and polynomial regression. Both algorithms have O(√ε)-dependence in error on the fraction of outlier ε. We will present and prove the theoretical guarantees of these algorithms which shed lights on future directions in which the error dependence and the required assumptions can be improved.

Advisors: Alistair Sinclair

BibTeX citation:

@mastersthesis{Wang:EECS-2022-97,
    Author= {Wang, Yimeng},
    Title= {Algorithms for Robust Linear Models against Strong Adversarial Corruptions},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-97.html},
    Number= {UCB/EECS-2022-97},
    Abstract= {In real-world data science and machine learning, data are inevitably imperfect. Data contamination comes in many sources. It may come from human errors which can be avoided with more caution. However, it may also come from sources such as systematic measurement errors and adversarial data poisoning that are hard to avoid and even detect. Consequently, there is a need for methods that can perform certain tasks in statistics despite this difficulty. Formally speaking, we want to design efficient algorithms that can provide provable guarantees for learning problems under certain models of contamination.
In the article, we examine some important techniques in the recent development of efficient algo- rithms for robust statistics, namely filtering-based methods and sum-of-squares techniques. Specifi- cally, we will focus on the problem of learning linear models (including linear regression, generalized linear models etc.) under the strong contamination model. We will fully present and analyze the con- ditions and consequences of SEVER [DKK+19] and the sum-of-squares-based algorithm for robust linear regression in [KKM20]. SEVER is meta-algorithm that takes in a well-conditioned base learner and output a outlier-robust version of the base learners. The [KKM20] robust linear regression al- gorithm is an elegant and simple application of sum-of-squares techniques for robust regressions in general including l1, l2 and polynomial regression. Both algorithms have O(√ε)-dependence in error on the fraction of outlier ε. We will present and prove the theoretical guarantees of these algorithms which shed lights on future directions in which the error dependence and the required assumptions can be improved.},
}

EndNote citation:

%0 Thesis
%A Wang, Yimeng 
%T Algorithms for Robust Linear Models against Strong Adversarial Corruptions
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 13
%@ UCB/EECS-2022-97
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-97.html
%F Wang:EECS-2022-97