Yang You and Zhao Zhang and Cho-Jui Hsieh and James Demmel and Kurt Keutzer

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-18

January 25, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-18.pdf

In this paper, we investigate large scale computers' capability of speeding up deep neural networks (DNN) training. Our approach is to use large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources. Our approach is generic, as we empirically evaluate the effectiveness on two neural networks: AlexNet and ResNet-50 trained with the ImageNet-1k dataset while preserving the state-of-the-art test accuracy. Compared to the baseline of a previous study from a group of researchers at Facebook, our approach shows higher test accuracy on batch sizes that are larger than 16K. Using 2,048 Intel Xeon Platinum 8160 processors, we reduce the 100-epoch AlexNet training time from hours to 11 minutes. With 2,048 Intel Xeon Phi 7250 Processors, we reduce the 90-epoch ResNet-50 training time from hours to 20 minutes. Our implementation is open source and has been released in the Intel distribution of Caffe v1.0.7.


BibTeX citation:

@techreport{You:EECS-2020-18,
    Author= {You, Yang and Zhang, Zhao and Hsieh, Cho-Jui and Demmel, James and Keutzer, Kurt},
    Title= {ImageNet Training in Minutes},
    Year= {2020},
    Month= {Jan},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-18.html},
    Number= {UCB/EECS-2020-18},
    Abstract= {In this paper, we investigate large scale computers' capability of speeding up deep neural networks (DNN) training. Our approach is to use large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources. Our approach is generic, as we empirically evaluate the effectiveness on two neural networks: AlexNet and ResNet-50 trained with the ImageNet-1k dataset while preserving the state-of-the-art test accuracy. Compared to the baseline of a previous study from a group of researchers at Facebook, our approach shows higher test accuracy on batch sizes that are larger than 16K. Using 2,048 Intel Xeon Platinum 8160 processors, we reduce the 100-epoch AlexNet training time from hours to 11 minutes. With 2,048 Intel Xeon Phi 7250 Processors, we reduce the 90-epoch ResNet-50 training time from hours to 20 minutes. Our implementation is open source and has been released in the Intel distribution of Caffe v1.0.7.},
}

EndNote citation:

%0 Report
%A You, Yang 
%A Zhang, Zhao 
%A Hsieh, Cho-Jui 
%A Demmel, James 
%A Keutzer, Kurt 
%T ImageNet Training in Minutes
%I EECS Department, University of California, Berkeley
%D 2020
%8 January 25
%@ UCB/EECS-2020-18
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-18.html
%F You:EECS-2020-18