Low-Dimensional Models for PCA and Regression

Dapo Omidiran

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2013-194
December 1, 2013

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-194.pdf

This thesis examines two separate statistical problems for which low-dimensional models are effective.

In the first part of this thesis, we examine the Robust Principal Components Analysis (RPCA) problem: given a matrix $\datam$ that is the sum of a low-rank matrix $\lowopt$ and a sparse noise matrix $\sparseopt$, recover $\lowopt$ and $\sparseopt$. This problem appears in various settings, including image processing, computer vision, and graphical models. Various polynomial-time heuristics and algorithms have been proposed to solve this problem. We introduce a block coordinate descent algorithm for this problem and prove a convergence result. In addition, our iterative algorithm has low complexity per iteration and empirically performs well on synthetic datasets.

In the second part of this thesis, we examine a variant of ridge regression: unlike in the classical setting where we know that the parameter of interest lies near a single point, we instead only know that it lies near a known low-dimensional subspace. We formulate this regression problem as a convex optimization problem, and introduce an efficient block coordinate descent algorithm for solving it. We demonstrate that this ``subspace prior" version of ridge regression is an appropriate model for understanding player effectiveness in basketball. In particular, we apply our algorithm to real-world data and demonstrate empirically that it produces a more accurate model of player effectiveness by showing that (1) the algorithm outperforms existing approaches and (2) it leads to a profitable betting strategy.

Advisor: Laurent El Ghaoui


BibTeX citation:

@phdthesis{Omidiran:EECS-2013-194,
    Author = {Omidiran, Dapo},
    Title = {Low-Dimensional Models for PCA and Regression},
    School = {EECS Department, University of California, Berkeley},
    Year = {2013},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-194.html},
    Number = {UCB/EECS-2013-194},
    Abstract = {This thesis examines two separate statistical problems for which
low-dimensional models are effective.

In the first part of this thesis, we examine the Robust Principal Components Analysis
(RPCA) problem: given a
matrix $\datam$ that is the sum of a low-rank matrix $\lowopt$ and a sparse noise matrix
$\sparseopt$, recover $\lowopt$ and $\sparseopt$.
This problem appears in various settings, including image processing,
computer vision, and graphical models. Various polynomial-time heuristics
and algorithms have been proposed to solve this problem.
We introduce a block coordinate descent algorithm for this
problem and prove a convergence result. In addition, our iterative
algorithm has low complexity per iteration and empirically performs well
on synthetic datasets.

In the second part of this thesis, we examine a variant of ridge regression:
unlike in the classical setting where we know that the parameter of
interest lies near a single point, we instead only know that it lies near
a known low-dimensional subspace.
We formulate this regression problem as a convex optimization problem, and
introduce an efficient block coordinate descent algorithm for solving it.
We demonstrate that this ``subspace prior" version of ridge regression is
an appropriate model for understanding player effectiveness in basketball.
In particular, we apply our algorithm to real-world data and demonstrate
empirically that it produces a more accurate model of player effectiveness
by showing that (1) the algorithm outperforms existing approaches and (2)
it leads to a profitable betting strategy.}
}

EndNote citation:

%0 Thesis
%A Omidiran, Dapo
%T Low-Dimensional Models for PCA and Regression
%I EECS Department, University of California, Berkeley
%D 2013
%8 December 1
%@ UCB/EECS-2013-194
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-194.html
%F Omidiran:EECS-2013-194