Accelerating Deep Learning on Heterogenous Architectures

Avinash Nandakumar

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-100

May 13, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-100.pdf

The growth of machine learning workloads, specifically deep neural networks (DNNs), in both warehouse scale computing (WSC) and on-edge mobile computing has driven a huge demand in di↵erent types of accelerators. This project focuses on exploring the di↵erent lev- els of parallelism when running deep learning inferences on heterogeneous architectures and characterization of coordinating unique accelerators with varying workloads. We have imple- mented an accelerated depthwise convolution kernel on a vector accelerator and explored the design space of executing MobileNetv2 in di↵erent configurations on an architecture consist- ing of both a systolic and vector accelerator. This work examines shared resource contention at the memory level with this given architecture and analyzes the e↵ects of model pipelining and batch parallelism. Through layer by layer performance and cache analysis we examine the best parameters and configurations to execute MobileNetv2 inference, observing a 1.4x and 3.5x speedup over a naively accelerated baseline on single core and multi core SoCs.

Advisors: Sophia Shao

BibTeX citation:

@mastersthesis{Nandakumar:EECS-2022-100,
    Author= {Nandakumar, Avinash},
    Editor= {Shao, Sophia and Nikolic, Borivoje},
    Title= {Accelerating Deep Learning on Heterogenous Architectures},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-100.html},
    Number= {UCB/EECS-2022-100},
    Abstract= {The growth of machine learning workloads, specifically deep neural networks (DNNs), in both warehouse scale computing (WSC) and on-edge mobile computing has driven a huge demand in di↵erent types of accelerators. This project focuses on exploring the di↵erent lev- els of parallelism when running deep learning inferences on heterogeneous architectures and characterization of coordinating unique accelerators with varying workloads. We have imple- mented an accelerated depthwise convolution kernel on a vector accelerator and explored the design space of executing MobileNetv2 in di↵erent configurations on an architecture consist- ing of both a systolic and vector accelerator. This work examines shared resource contention at the memory level with this given architecture and analyzes the e↵ects of model pipelining and batch parallelism. Through layer by layer performance and cache analysis we examine the best parameters and configurations to execute MobileNetv2 inference, observing a 1.4x and 3.5x speedup over a naively accelerated baseline on single core and multi core SoCs.},
}

EndNote citation:

%0 Thesis
%A Nandakumar, Avinash 
%E Shao, Sophia 
%E Nikolic, Borivoje 
%T Accelerating Deep Learning on Heterogenous Architectures
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 13
%@ UCB/EECS-2022-100
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-100.html
%F Nandakumar:EECS-2022-100