ImageNet Training in Minutes

Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel and Kurt Keutzer

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2020-18
January 25, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-18.pdf

In this paper, we investigate large scale computers' capability of speeding up deep neural networks (DNN) training. Our approach is to use large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources. Our approach is generic, as we empirically evaluate the effectiveness on two neural networks: AlexNet and ResNet-50 trained with the ImageNet-1k dataset while preserving the state-of-the-art test accuracy. Compared to the baseline of a previous study from a group of researchers at Facebook, our approach shows higher test accuracy on batch sizes that are larger than 16K. Using 2,048 Intel Xeon Platinum 8160 processors, we reduce the 100-epoch AlexNet training time from hours to 11 minutes. With 2,048 Intel Xeon Phi 7250 Processors, we reduce the 90-epoch ResNet-50 training time from hours to 20 minutes. Our implementation is open source and has been released in the Intel distribution of Caffe v1.0.7.


BibTeX citation:

@techreport{You:EECS-2020-18,
    Author = {You, Yang and Zhang, Zhao and Hsieh, Cho-Jui and Demmel, James and Keutzer, Kurt},
    Title = {ImageNet Training in Minutes},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2020},
    Month = {Jan},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-18.html},
    Number = {UCB/EECS-2020-18},
    Abstract = {In this paper, we investigate large scale computers' capability of speeding up deep neural networks (DNN) training. Our approach is to use large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources. Our approach is generic, as we empirically evaluate the effectiveness on two neural networks: AlexNet and ResNet-50 trained with the ImageNet-1k dataset while preserving the state-of-the-art test accuracy. Compared to the baseline of a previous study from a group of researchers at Facebook, our approach shows higher test accuracy on batch sizes that are larger than 16K. Using 2,048 Intel Xeon Platinum 8160 processors, we reduce the 100-epoch AlexNet training time from hours to 11 minutes. With 2,048 Intel Xeon Phi 7250 Processors, we reduce the 90-epoch ResNet-50 training time from hours to 20 minutes. Our implementation is open source and has been released in the Intel distribution of Caffe v1.0.7.}
}

EndNote citation:

%0 Report
%A You, Yang
%A Zhang, Zhao
%A Hsieh, Cho-Jui
%A Demmel, James
%A Keutzer, Kurt
%T ImageNet Training in Minutes
%I EECS Department, University of California, Berkeley
%D 2020
%8 January 25
%@ UCB/EECS-2020-18
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-18.html
%F You:EECS-2020-18