Accelerating Deep Learning on Heterogenous Architectures
Avinash Nandakumar
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2022-100
May 13, 2022
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-100.pdf
The growth of machine learning workloads, specifically deep neural networks (DNNs), in both warehouse scale computing (WSC) and on-edge mobile computing has driven a huge demand in di↵erent types of accelerators. This project focuses on exploring the di↵erent lev- els of parallelism when running deep learning inferences on heterogeneous architectures and characterization of coordinating unique accelerators with varying workloads. We have imple- mented an accelerated depthwise convolution kernel on a vector accelerator and explored the design space of executing MobileNetv2 in di↵erent configurations on an architecture consist- ing of both a systolic and vector accelerator. This work examines shared resource contention at the memory level with this given architecture and analyzes the e↵ects of model pipelining and batch parallelism. Through layer by layer performance and cache analysis we examine the best parameters and configurations to execute MobileNetv2 inference, observing a 1.4x and 3.5x speedup over a naively accelerated baseline on single core and multi core SoCs.
Advisors: Sophia Shao
BibTeX citation:
@mastersthesis{Nandakumar:EECS-2022-100, Author= {Nandakumar, Avinash}, Editor= {Shao, Sophia and Nikolic, Borivoje}, Title= {Accelerating Deep Learning on Heterogenous Architectures}, School= {EECS Department, University of California, Berkeley}, Year= {2022}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-100.html}, Number= {UCB/EECS-2022-100}, Abstract= {The growth of machine learning workloads, specifically deep neural networks (DNNs), in both warehouse scale computing (WSC) and on-edge mobile computing has driven a huge demand in di↵erent types of accelerators. This project focuses on exploring the di↵erent lev- els of parallelism when running deep learning inferences on heterogeneous architectures and characterization of coordinating unique accelerators with varying workloads. We have imple- mented an accelerated depthwise convolution kernel on a vector accelerator and explored the design space of executing MobileNetv2 in di↵erent configurations on an architecture consist- ing of both a systolic and vector accelerator. This work examines shared resource contention at the memory level with this given architecture and analyzes the e↵ects of model pipelining and batch parallelism. Through layer by layer performance and cache analysis we examine the best parameters and configurations to execute MobileNetv2 inference, observing a 1.4x and 3.5x speedup over a naively accelerated baseline on single core and multi core SoCs.}, }
EndNote citation:
%0 Thesis %A Nandakumar, Avinash %E Shao, Sophia %E Nikolic, Borivoje %T Accelerating Deep Learning on Heterogenous Architectures %I EECS Department, University of California, Berkeley %D 2022 %8 May 13 %@ UCB/EECS-2022-100 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-100.html %F Nandakumar:EECS-2022-100