Hardware Accelerators for Graph Convolutional Networks

Kareem Ahmad

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-148

May 21, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-148.pdf

Most datasets in real-world systems have relationships that are not Euclidean in nature, and are instead best described using graphs. The development of Graph Convolutional Networks (GCNs) has proven to be an efficient approach to learning on graph-structured data. Due to the sparse nature of graphs, however, traditional systolic-array based matrix-algebra accelerators do not achieve high levels of utilization when running inference on GCNs. In this paper, we characterize the performance of GCNs in terms of its four major operations: dense direct memory access (dDMA), sparse direct memory access (sDMA), dense-dense matrix multiplication (GeMM), and sparse-dense matrix multiplication (SpMM), laying the groundwork for adding efficient GCN support to Gemmini, a configurable systolic-array based GeMM accelerator. We also propose the addition of a sparse-todense decompression DMA Engine to Gemmini, providing a reference implementation in Spike—the RISC-V ISA-level simulator—and C tests.

Advisors: Sophia Shao

BibTeX citation:

@mastersthesis{Ahmad:EECS-2021-148,
    Author= {Ahmad, Kareem},
    Title= {Hardware Accelerators for Graph Convolutional Networks},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-148.html},
    Number= {UCB/EECS-2021-148},
    Abstract= {Most datasets in real-world systems have relationships that are not Euclidean in nature, and are instead best described using graphs. The development of Graph Convolutional Networks (GCNs) has proven to be an efficient approach to learning on graph-structured data. Due to the sparse nature of graphs, however, traditional systolic-array based matrix-algebra accelerators do not achieve high levels of utilization when running inference on GCNs. In this paper, we characterize the performance of GCNs in terms of its four major operations: dense direct memory access (dDMA), sparse direct memory access (sDMA), dense-dense matrix multiplication (GeMM), and sparse-dense matrix multiplication (SpMM), laying the groundwork for adding efficient GCN support to Gemmini, a configurable systolic-array based GeMM accelerator. We also propose the addition of a sparse-todense decompression DMA Engine to Gemmini, providing a reference implementation in Spike—the RISC-V ISA-level simulator—and C tests.},
}

EndNote citation:

%0 Thesis
%A Ahmad, Kareem 
%T Hardware Accelerators for Graph Convolutional Networks
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 21
%@ UCB/EECS-2021-148
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-148.html
%F Ahmad:EECS-2021-148