ImageNet Training in Minutes
Yang You and Zhao Zhang and Cho-Jui Hsieh and James Demmel and Kurt Keutzer
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2020-18
January 25, 2020
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-18.pdf
In this paper, we investigate large scale computers' capability of speeding up deep neural networks (DNN) training. Our approach is to use large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources. Our approach is generic, as we empirically evaluate the effectiveness on two neural networks: AlexNet and ResNet-50 trained with the ImageNet-1k dataset while preserving the state-of-the-art test accuracy. Compared to the baseline of a previous study from a group of researchers at Facebook, our approach shows higher test accuracy on batch sizes that are larger than 16K. Using 2,048 Intel Xeon Platinum 8160 processors, we reduce the 100-epoch AlexNet training time from hours to 11 minutes. With 2,048 Intel Xeon Phi 7250 Processors, we reduce the 90-epoch ResNet-50 training time from hours to 20 minutes. Our implementation is open source and has been released in the Intel distribution of Caffe v1.0.7.
BibTeX citation:
@techreport{You:EECS-2020-18, Author= {You, Yang and Zhang, Zhao and Hsieh, Cho-Jui and Demmel, James and Keutzer, Kurt}, Title= {ImageNet Training in Minutes}, Year= {2020}, Month= {Jan}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-18.html}, Number= {UCB/EECS-2020-18}, Abstract= {In this paper, we investigate large scale computers' capability of speeding up deep neural networks (DNN) training. Our approach is to use large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources. Our approach is generic, as we empirically evaluate the effectiveness on two neural networks: AlexNet and ResNet-50 trained with the ImageNet-1k dataset while preserving the state-of-the-art test accuracy. Compared to the baseline of a previous study from a group of researchers at Facebook, our approach shows higher test accuracy on batch sizes that are larger than 16K. Using 2,048 Intel Xeon Platinum 8160 processors, we reduce the 100-epoch AlexNet training time from hours to 11 minutes. With 2,048 Intel Xeon Phi 7250 Processors, we reduce the 90-epoch ResNet-50 training time from hours to 20 minutes. Our implementation is open source and has been released in the Intel distribution of Caffe v1.0.7.}, }
EndNote citation:
%0 Report %A You, Yang %A Zhang, Zhao %A Hsieh, Cho-Jui %A Demmel, James %A Keutzer, Kurt %T ImageNet Training in Minutes %I EECS Department, University of California, Berkeley %D 2020 %8 January 25 %@ UCB/EECS-2020-18 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-18.html %F You:EECS-2020-18