Factoring Matrices into Linear Neural Networks

Sagnik Bhattacharya and Jonathan Shewchuk

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-92

May 13, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-92.pdf

We characterize the topology and geometry of the set of all weight vectors for which a linear neural network computes the same linear transformation W. This set of weight assignments is called the fiber of W, and it is embedded in a Euclidean weight space of all possible weight vectors. The fiber is an algebraic variety with singular points, hence it is not a manifold. We show a way to stratify the fiber---that is, to partition the algebraic variety into a finite set of manifolds of varying dimensions called strata. We derive the dimensions of these strata and the relationships by which they adjoin each other. (Although they are disjoint, some strata lie in the closures of other, higher-dimensional strata.) Each stratum is smoothly embedded in weight space, so it has a well-defined tangent space (which is a subspace of weight space) at every point. We show how to determine the subspace tangent to a specified stratum at a specified point on the stratum, and we construct an elegant basis for that subspace.

To help achieve these goals, we first derive a Fundamental Theorem of Linear Neural Networks, analogous to Gilbert Strang's Fundamental Theorem of Linear Algebra. We show how to decompose each layer of a linear neural network into a set of subspaces that show how information flows through the neural network---in particular, tracing which information is annihilated at which layers of the network, and identifying subspaces that carry no information but might become available to carry information as training modifies the network weights. We summarize properties of these information flows in basis flow diagrams that reveal a rich and occasionally surprising structure. Each stratum of the fiber represents a different pattern by which information flows (or fails to flow) through the neural network.

We use this knowledge to find transformations in weight space called moves that allow us to modify the neural network's weights without changing the linear transformation that the network computes. Some moves stay on the same stratum, and some move from one stratum to another stratum of the fiber. In this way, we can visit different weight assignments for which the neural network computes the same transformation. These moves help us to construct a useful basis for the weight space and a useful basis for each space tangent to a stratum.

Advisors: Jonathan Shewchuk

BibTeX citation:

@mastersthesis{Bhattacharya:EECS-2022-92,
Author= {Bhattacharya, Sagnik and Shewchuk, Jonathan},
Title= {Factoring Matrices into Linear Neural Networks},
School= {EECS Department, University of California, Berkeley},
Year= {2022},
Month= {May},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-92.html},
Number= {UCB/EECS-2022-92},
Abstract= {We characterize the topology and geometry of the set of all weight vectors for which a linear neural network computes the same linear transformation W. This set of weight assignments is called the fiber of W, and it is embedded in a Euclidean weight space of all possible weight vectors. The fiber is an algebraic variety with singular points, hence it is not a manifold. We show a way to stratify the fiber---that is, to partition the algebraic variety into a finite set of manifolds of varying dimensions called strata. We derive the dimensions of these strata and
the relationships by which they adjoin each other. (Although they are disjoint, some strata lie in the closures of other, higher-dimensional strata.) Each stratum is smoothly embedded in weight space, so it has a well-defined tangent space (which is a subspace of weight space) at every point. We show how to determine the subspace tangent to a specified stratum at a specified point on the stratum, and we construct an elegant basis for that subspace.

To help achieve these goals, we first derive a Fundamental Theorem of Linear Neural Networks, analogous to Gilbert Strang's Fundamental Theorem of Linear Algebra. We show how to decompose each layer of a linear neural network into
a set of subspaces that show how information flows through the neural network---in particular, tracing which information is annihilated at which layers of the network, and identifying subspaces that carry no information but might become available to carry information as training modifies the network weights. We summarize properties of these information flows in basis flow diagrams that reveal a rich and occasionally surprising structure. Each stratum of the fiber represents a different pattern by which information flows (or fails to flow) through the neural network.

EndNote citation:

%0 Thesis
%A Bhattacharya, Sagnik 
%A Shewchuk, Jonathan 
%T Factoring Matrices into Linear Neural Networks
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 13
%@ UCB/EECS-2022-92
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-92.html
%F Bhattacharya:EECS-2022-92