Alon Amid

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2019-6

April 19, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-6.pdf

Graph processing kernels and sparse-representation linear algebra workloads such as PageRank are increasingly used in machine learning and graph analytics contexts. While data-parallel processing and chip-multiprocessors have both been used in recent years as complementary mitigations to the slowing rate of single-thread performance improvements, they have been used together most efficiently on dense data-structure representations as opposed to sparse representations. This work presents nested-parallelism implementations of PageRank for RISC-V multi-processor Rocket chip SoCs with vector architecture accelerators. These software implementations are used for hardware and software design-space exploration using FPGA-accelerated simulation with multiple silicon-proven multi-processor SoC configurations. The design space includes a variety of scalar cores, vector accelerator cores, and cache parameters, as well as multiple software implementations with tunable parallelism parameters. This report shows the benefits of the loop-raking vectorizing technique compared to an alternative vectoring technique, and presents up to a 14x run-time speedup relative to a parallel-scalar implementation running on the same SoC configuration. A 25x speedup is demonstrated in a dual-tile SoC with dual-lanes-per-tile vector accelerators, compared to a minimal scalar implementation, demonstrating the scalability of the proposed nested-parallelism techniques.

Advisors: Borivoje Nikolic and Krste Asanović


BibTeX citation:

@mastersthesis{Amid:EECS-2019-6,
    Author= {Amid, Alon},
    Editor= {Nikolic, Borivoje and Asanović, Krste},
    Title= {Nested-Parallelism PageRank on RISC-V Vector Multi-Processors},
    School= {EECS Department, University of California, Berkeley},
    Year= {2019},
    Month= {Apr},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-6.html},
    Number= {UCB/EECS-2019-6},
    Abstract= {Graph processing kernels and sparse-representation linear algebra workloads such as PageRank are increasingly used in machine learning and graph analytics contexts. While data-parallel processing and chip-multiprocessors have both been used in recent years as complementary mitigations to the slowing rate of single-thread performance improvements, they have been used together most efficiently on dense data-structure representations as opposed to sparse representations. This work presents nested-parallelism implementations of PageRank for RISC-V multi-processor Rocket chip SoCs with vector architecture accelerators. These software implementations are used for hardware and software design-space exploration using FPGA-accelerated simulation with multiple silicon-proven multi-processor SoC configurations. The design space includes a variety of scalar cores, vector accelerator cores, and cache parameters, as well as multiple software implementations with tunable parallelism parameters. This report shows the benefits of the loop-raking vectorizing technique
compared to an alternative vectoring technique, and presents up to a 14x run-time speedup relative to a parallel-scalar implementation running on the same SoC configuration. A 25x speedup is demonstrated in a dual-tile SoC with dual-lanes-per-tile vector accelerators, compared to a minimal scalar implementation, demonstrating the scalability of the proposed nested-parallelism techniques.},
}

EndNote citation:

%0 Thesis
%A Amid, Alon 
%E Nikolic, Borivoje 
%E Asanović, Krste 
%T Nested-Parallelism PageRank on RISC-V Vector Multi-Processors
%I EECS Department, University of California, Berkeley
%D 2019
%8 April 19
%@ UCB/EECS-2019-6
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-6.html
%F Amid:EECS-2019-6