A Two-regime Model Of Network Pruning: Metrics And Scaling Laws

Yefan Zhou and Steven Gunarso and Chen Wang and Yaoqing Yang and Michael Mahoney

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-51

May 1, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-51.pdf

Due to the increasing memory and computation cost of large-scale deep neural networks, there is a new trend in the AI community to design principled network compression approaches. In this Capstone Project, we find a dichotomous phenomenon of a widely-studied compression approach, network pruning, which removes unimportant network weights while maintaining the test-time performance of neural networks. We find that, depending on the relative size of the pruned weights and the training data, early stopping the training process before pruning can significantly help or harm the test-time performance of pruned models. Having shown the existence of this dichotomous phenomenon on both computer vision (CV) and natural language processing (NLP) benchmarks, we study the following two questions to quantify the phenomenon. (1) How do we determine the transition line between using early stopping or training to convergence (in the two-dimensional space of data size and pruned model size)? (2) If we need early stopping, how do we determine the optimal early stopping time? We show that metrics derived from the recently proposed Heavy-Tailed Self-Regularization (HT-SR) theory and the analysis of neural network loss landscapes can answer these questions. In particular, we show that the rand_distance metric, which measures the distance in the spectral domain between trained and randomly initialized weights, can be used to determine the early stopping time. We also show preliminary results that the model similarity, a newly proposed metric related to the loss landscapes of neural networks, can determine the transition line between using early stopping or training to convergence.

Advisors: Michael William Mahoney

BibTeX citation:

@mastersthesis{Zhou:EECS-2023-51,
    Author= {Zhou, Yefan and Gunarso, Steven and Wang, Chen and Yang, Yaoqing and Mahoney, Michael},
    Title= {A Two-regime Model Of Network Pruning: Metrics And Scaling Laws},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-51.html},
    Number= {UCB/EECS-2023-51},
    Abstract= {Due to the increasing memory and computation cost of large-scale deep neural networks, there is a new trend in the AI community to design principled network compression approaches. In this Capstone Project, we find a dichotomous phenomenon of a widely-studied compression approach, network pruning, which removes unimportant network weights while maintaining the test-time performance of neural networks. We find that, depending on the relative size of the pruned weights and the training data, early stopping the training process before pruning can significantly help or harm the test-time performance of pruned models. Having shown the existence of this dichotomous phenomenon on both computer vision (CV) and natural language processing (NLP) benchmarks, we study the following two questions to quantify the phenomenon. (1) How do we determine the transition line between using early stopping or training to convergence (in the two-dimensional space of data size and pruned model size)? (2) If we need early stopping, how do we determine the optimal early stopping time? We show that metrics derived from the recently proposed Heavy-Tailed Self-Regularization (HT-SR) theory and the analysis of neural network loss landscapes can answer these questions. In particular, we show that the rand_distance metric, which measures the distance in the spectral domain between trained and randomly initialized weights, can be used to determine the early stopping time. We also show preliminary results that the model similarity, a newly proposed metric related to the loss landscapes of neural networks, can determine the transition line between using early stopping or training to convergence.},
}

EndNote citation:

%0 Thesis
%A Zhou, Yefan 
%A Gunarso, Steven 
%A Wang, Chen 
%A Yang, Yaoqing 
%A Mahoney, Michael 
%T A Two-regime Model Of Network Pruning: Metrics And Scaling Laws
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 1
%@ UCB/EECS-2023-51
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-51.html
%F Zhou:EECS-2023-51