Geometric Sampling Theory, Triangulations, and Robust Machine Learning
Marc Khoury
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2020-49
May 15, 2020
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-49.pdf
Manifold triangulation is the problem of, given a set of points $V$ sampled densely from some known manifold $\Sigma$, output a set of simplices $\mathcal{T}$ that is topologically identical to $\Sigma$ and geometrically close to $\Sigma$. In the first part of this thesis we study a variant of this problem with one additional constraint. In addition to a set of points $V$ densely sampled from a known $\Sigma$ we are also given a finite set of line segments $S$, whose endpoints are in $V$, which must appear as edges in the output triangulation $\mathcal{T}$. To solve this problem we introduce \emph{restricted constrained Delaunay triangulations} (restricted CDTs), which combine ideas from constrained Delaunay triangulations and restricted Delaunay triangulations to enable the enforcement of constraining edges on triangulations of smooth surfaces. We prove several combinatorial properties of restricted CDTs, including conditions under which the restricted CDT contains every constraining segment, conditions under which the restricted CDT is homeomorphic to the underlying surface $\Sigma$, and a characterization of which vertices must be considered to compute the triangles near a segment. The restricted CDT has immediate practical applications in surface meshing and geometric modeling. Along the way we improve many commonly used supporting results in geometric sampling theory.
In the second part of this thesis we apply the geometric tools and high-dimensional intuition developed in the previous chapters to problems in machine learning. We study the problem of adversarial examples, a pervasive phenomenon of machine learning models where perturbations of the input that are imperceptible to humans reliably lead to confident incorrect classifications. We study robustness to adversarial examples under the ``Manifold Hypothesis'': the observation that `real' data often exhibits low-dimensional structure. Our results highlight the role of codimension, the difference between the dimension of the data manifold and the dimension of the embedding space, in adversarial robustness. We prove a tradeoff between robustness in different norms, show that adversarial training is sample inefficient, and that robustness requires larger models.
Lastly we study the relationship between robustness and optimization in the linear regression setting. We show an example of a learning problem for which the solution found by adaptive optimization algorithms exhibits qualitatively worse robustness properties against both $L_{2}$- and $L_{\infty}$-adversaries than the solution found by non-adaptive algorithms. Then we fully characterize the geometry of the loss landscape of $L_{2}$-adversarial training in least-squares linear regression. The geometry of the loss landscape is subtle and has important consequences for optimization algorithms.
Advisors: Jonathan Shewchuk
BibTeX citation:
@phdthesis{Khoury:EECS-2020-49, Author= {Khoury, Marc}, Title= {Geometric Sampling Theory, Triangulations, and Robust Machine Learning}, School= {EECS Department, University of California, Berkeley}, Year= {2020}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-49.html}, Number= {UCB/EECS-2020-49}, Abstract= {Manifold triangulation is the problem of, given a set of points $V$ sampled densely from some known manifold $\Sigma$, output a set of simplices $\mathcal{T}$ that is topologically identical to $\Sigma$ and geometrically close to $\Sigma$. In the first part of this thesis we study a variant of this problem with one additional constraint. In addition to a set of points $V$ densely sampled from a known $\Sigma$ we are also given a finite set of line segments $S$, whose endpoints are in $V$, which must appear as edges in the output triangulation $\mathcal{T}$. To solve this problem we introduce \emph{restricted constrained Delaunay triangulations} (restricted CDTs), which combine ideas from constrained Delaunay triangulations and restricted Delaunay triangulations to enable the enforcement of constraining edges on triangulations of smooth surfaces. We prove several combinatorial properties of restricted CDTs, including conditions under which the restricted CDT contains every constraining segment, conditions under which the restricted CDT is homeomorphic to the underlying surface $\Sigma$, and a characterization of which vertices must be considered to compute the triangles near a segment. The restricted CDT has immediate practical applications in surface meshing and geometric modeling. Along the way we improve many commonly used supporting results in geometric sampling theory. In the second part of this thesis we apply the geometric tools and high-dimensional intuition developed in the previous chapters to problems in machine learning. We study the problem of adversarial examples, a pervasive phenomenon of machine learning models where perturbations of the input that are imperceptible to humans reliably lead to confident incorrect classifications. We study robustness to adversarial examples under the ``Manifold Hypothesis'': the observation that `real' data often exhibits low-dimensional structure. Our results highlight the role of codimension, the difference between the dimension of the data manifold and the dimension of the embedding space, in adversarial robustness. We prove a tradeoff between robustness in different norms, show that adversarial training is sample inefficient, and that robustness requires larger models. Lastly we study the relationship between robustness and optimization in the linear regression setting. We show an example of a learning problem for which the solution found by adaptive optimization algorithms exhibits qualitatively worse robustness properties against both $L_{2}$- and $L_{\infty}$-adversaries than the solution found by non-adaptive algorithms. Then we fully characterize the geometry of the loss landscape of $L_{2}$-adversarial training in least-squares linear regression. The geometry of the loss landscape is subtle and has important consequences for optimization algorithms.}, }
EndNote citation:
%0 Thesis %A Khoury, Marc %T Geometric Sampling Theory, Triangulations, and Robust Machine Learning %I EECS Department, University of California, Berkeley %D 2020 %8 May 15 %@ UCB/EECS-2020-49 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-49.html %F Khoury:EECS-2020-49