Layerwise Training of Deep Neural Networks

Elicia Ye and Tianyu Pang and Alex Zhao and Yefan Zhou and Yaoqing Yang and Michael Mahoney and Kannan Ramchandran

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-126

May 12, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-126.pdf

Deep networks were introduced to mimic the human brain through artificial neurons, performing a variety of tasks in computer vision (CV) and natural language processing (NLP). It is natural to assume that a network's representational power must scale in complexity with the tasks or dataset it processes. In practice, however, increasing the amount of data or number of layers and parameters is not always the answer. This motivates a more comprehensive view of the inner workings of a deep neural network, taking a deep dive into each of its components. A common approach is to directly examine its weights, but the tradeoff is potentially missing out on information about the network structure. To take the middle ground, we analyze structural characteristics arising from layerwise spectral distributions in order to explain network performance and inform training procedures.

Advisors: Kannan Ramchandran

BibTeX citation:

@mastersthesis{Ye:EECS-2023-126,
    Author= {Ye, Elicia and Pang, Tianyu and Zhao, Alex and Zhou, Yefan and Yang, Yaoqing and Mahoney, Michael and Ramchandran, Kannan},
    Title= {Layerwise Training of Deep Neural Networks},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-126.html},
    Number= {UCB/EECS-2023-126},
    Abstract= {Deep networks were introduced to mimic the human brain through artificial neurons, performing a variety of tasks in computer vision (CV) and natural language processing (NLP). It is natural to assume that a network's representational power must scale in complexity with the tasks or dataset it processes. In practice, however, increasing the amount of data or number of layers and parameters is not always the answer.
This motivates a more comprehensive view of the inner workings of a deep neural network, taking a deep dive into each of its components. A common approach is to directly examine its weights, but the tradeoff is potentially missing out on information about the network structure. To take the middle ground, we analyze structural characteristics arising from layerwise spectral distributions in order to explain network performance and inform training procedures.},
}

EndNote citation:

%0 Thesis
%A Ye, Elicia 
%A Pang, Tianyu 
%A Zhao, Alex 
%A Zhou, Yefan 
%A Yang, Yaoqing 
%A Mahoney, Michael 
%A Ramchandran, Kannan 
%T Layerwise Training of Deep Neural Networks
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 12
%@ UCB/EECS-2023-126
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-126.html
%F Ye:EECS-2023-126