Nested-Parallelism PageRank on RISC-V Vector Multi-Processors
Alon Amid
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2019-6
April 19, 2019
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-6.pdf
Graph processing kernels and sparse-representation linear algebra workloads such as PageRank are increasingly used in machine learning and graph analytics contexts. While data-parallel processing and chip-multiprocessors have both been used in recent years as complementary mitigations to the slowing rate of single-thread performance improvements, they have been used together most efficiently on dense data-structure representations as opposed to sparse representations. This work presents nested-parallelism implementations of PageRank for RISC-V multi-processor Rocket chip SoCs with vector architecture accelerators. These software implementations are used for hardware and software design-space exploration using FPGA-accelerated simulation with multiple silicon-proven multi-processor SoC configurations. The design space includes a variety of scalar cores, vector accelerator cores, and cache parameters, as well as multiple software implementations with tunable parallelism parameters. This report shows the benefits of the loop-raking vectorizing technique compared to an alternative vectoring technique, and presents up to a 14x run-time speedup relative to a parallel-scalar implementation running on the same SoC configuration. A 25x speedup is demonstrated in a dual-tile SoC with dual-lanes-per-tile vector accelerators, compared to a minimal scalar implementation, demonstrating the scalability of the proposed nested-parallelism techniques.
Advisors: Borivoje Nikolic and Krste Asanović
BibTeX citation:
@mastersthesis{Amid:EECS-2019-6, Author= {Amid, Alon}, Editor= {Nikolic, Borivoje and Asanović, Krste}, Title= {Nested-Parallelism PageRank on RISC-V Vector Multi-Processors}, School= {EECS Department, University of California, Berkeley}, Year= {2019}, Month= {Apr}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-6.html}, Number= {UCB/EECS-2019-6}, Abstract= {Graph processing kernels and sparse-representation linear algebra workloads such as PageRank are increasingly used in machine learning and graph analytics contexts. While data-parallel processing and chip-multiprocessors have both been used in recent years as complementary mitigations to the slowing rate of single-thread performance improvements, they have been used together most efficiently on dense data-structure representations as opposed to sparse representations. This work presents nested-parallelism implementations of PageRank for RISC-V multi-processor Rocket chip SoCs with vector architecture accelerators. These software implementations are used for hardware and software design-space exploration using FPGA-accelerated simulation with multiple silicon-proven multi-processor SoC configurations. The design space includes a variety of scalar cores, vector accelerator cores, and cache parameters, as well as multiple software implementations with tunable parallelism parameters. This report shows the benefits of the loop-raking vectorizing technique compared to an alternative vectoring technique, and presents up to a 14x run-time speedup relative to a parallel-scalar implementation running on the same SoC configuration. A 25x speedup is demonstrated in a dual-tile SoC with dual-lanes-per-tile vector accelerators, compared to a minimal scalar implementation, demonstrating the scalability of the proposed nested-parallelism techniques.}, }
EndNote citation:
%0 Thesis %A Amid, Alon %E Nikolic, Borivoje %E Asanović, Krste %T Nested-Parallelism PageRank on RISC-V Vector Multi-Processors %I EECS Department, University of California, Berkeley %D 2019 %8 April 19 %@ UCB/EECS-2019-6 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-6.html %F Amid:EECS-2019-6