Avinash Nandakumar
EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2022-100
May 13, 2022
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-100.pdf
The growth of machine learning workloads, specifically deep neural networks (DNNs), in both warehouse scale computing (WSC) and on-edge mobile computing has driven a huge demand in di↵erent types of accelerators. This project focuses on exploring the di↵erent lev- els of parallelism when running deep learning inferences on heterogeneous architectures and characterization of coordinating unique accelerators with varying workloads. We have imple- mented an accelerated depthwise convolution kernel on a vector accelerator and explored the design space of executing MobileNetv2 in di↵erent configurations on an architecture consist- ing of both a systolic and vector accelerator. This work examines shared resource contention at the memory level with this given architecture and analyzes the e↵ects of model pipelining and batch parallelism. Through layer by layer performance and cache analysis we examine the best parameters and configurations to execute MobileNetv2 inference, observing a 1.4x and 3.5x speedup over a naively accelerated baseline on single core and multi core SoCs.
Advisor: Sophia Shao
BibTeX citation:
@mastersthesis{Nandakumar:EECS-2022-100, Author = {Nandakumar, Avinash}, Editor = {Shao, Sophia and Nikolic, Borivoje}, Title = {Accelerating Deep Learning on Heterogenous Architectures}, School = {EECS Department, University of California, Berkeley}, Year = {2022}, Month = {May}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-100.html}, Number = {UCB/EECS-2022-100}, Abstract = {The growth of machine learning workloads, specifically deep neural networks (DNNs), in both warehouse scale computing (WSC) and on-edge mobile computing has driven a huge demand in di↵erent types of accelerators. This project focuses on exploring the di↵erent lev- els of parallelism when running deep learning inferences on heterogeneous architectures and characterization of coordinating unique accelerators with varying workloads. We have imple- mented an accelerated depthwise convolution kernel on a vector accelerator and explored the design space of executing MobileNetv2 in di↵erent configurations on an architecture consist- ing of both a systolic and vector accelerator. This work examines shared resource contention at the memory level with this given architecture and analyzes the e↵ects of model pipelining and batch parallelism. Through layer by layer performance and cache analysis we examine the best parameters and configurations to execute MobileNetv2 inference, observing a 1.4x and 3.5x speedup over a naively accelerated baseline on single core and multi core SoCs.} }
EndNote citation:
%0 Thesis %A Nandakumar, Avinash %E Shao, Sophia %E Nikolic, Borivoje %T Accelerating Deep Learning on Heterogenous Architectures %I EECS Department, University of California, Berkeley %D 2022 %8 May 13 %@ UCB/EECS-2022-100 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-100.html %F Nandakumar:EECS-2022-100