Nested-Parallelism PageRank on RISC-V Vector Multi-Processors

Alon Amid

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2019-6
April 19, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-6.pdf

Graph processing kernels and sparse-representation linear algebra workloads such as PageRank are increasingly used in machine learning and graph analytics contexts. While data-parallel processing and chip-multiprocessors have both been used in recent years as complementary mitigations to the slowing rate of single-thread performance improvements, they have been used together most efficiently on dense data-structure representations as opposed to sparse representations. This work presents nested-parallelism implementations of PageRank for RISC-V multi-processor Rocket chip SoCs with vector architecture accelerators. These software implementations are used for hardware and software design-space exploration using FPGA-accelerated simulation with multiple silicon-proven multi-processor SoC configurations. The design space includes a variety of scalar cores, vector accelerator cores, and cache parameters, as well as multiple software implementations with tunable parallelism parameters. This report shows the benefits of the loop-raking vectorizing technique compared to an alternative vectoring technique, and presents up to a 14x run-time speedup relative to a parallel-scalar implementation running on the same SoC configuration. A 25x speedup is demonstrated in a dual-tile SoC with dual-lanes-per-tile vector accelerators, compared to a minimal scalar implementation, demonstrating the scalability of the proposed nested-parallelism techniques.

Advisor: Borivoje Nikolic and Krste Asanović


BibTeX citation:

@mastersthesis{Amid:EECS-2019-6,
    Author = {Amid, Alon},
    Editor = {Nikolic, Borivoje and Asanović, Krste},
    Title = {Nested-Parallelism PageRank on RISC-V Vector Multi-Processors},
    School = {EECS Department, University of California, Berkeley},
    Year = {2019},
    Month = {Apr},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-6.html},
    Number = {UCB/EECS-2019-6},
    Abstract = {Graph processing kernels and sparse-representation linear algebra workloads such as PageRank are increasingly used in machine learning and graph analytics contexts. While data-parallel processing and chip-multiprocessors have both been used in recent years as complementary mitigations to the slowing rate of single-thread performance improvements, they have been used together most efficiently on dense data-structure representations as opposed to sparse representations. This work presents nested-parallelism implementations of PageRank for RISC-V multi-processor Rocket chip SoCs with vector architecture accelerators. These software implementations are used for hardware and software design-space exploration using FPGA-accelerated simulation with multiple silicon-proven multi-processor SoC configurations. The design space includes a variety of scalar cores, vector accelerator cores, and cache parameters, as well as multiple software implementations with tunable parallelism parameters. This report shows the benefits of the loop-raking vectorizing technique
compared to an alternative vectoring technique, and presents up to a 14x run-time speedup relative to a parallel-scalar implementation running on the same SoC configuration. A 25x speedup is demonstrated in a dual-tile SoC with dual-lanes-per-tile vector accelerators, compared to a minimal scalar implementation, demonstrating the scalability of the proposed nested-parallelism techniques.}
}

EndNote citation:

%0 Thesis
%A Amid, Alon
%E Nikolic, Borivoje
%E Asanović, Krste
%T Nested-Parallelism PageRank on RISC-V Vector Multi-Processors
%I EECS Department, University of California, Berkeley
%D 2019
%8 April 19
%@ UCB/EECS-2019-6
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-6.html
%F Amid:EECS-2019-6