Generator-Based Design of Custom Systems-on-Chip for Numerical Data Analysis
Alon Amid
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2022-247
December 1, 2022
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-247.pdf
With the end of Dennard scaling and the subsequent demise of Moore's Law, the continuous demand for higher computing performance and efficiency is increasingly met through specialization of digital processors. In particular, numerical data processing and machine-learning applications incur high computational costs but often have common computational structures, acting as prime targets for hardware customization. Specialization of digital designs is accompanied by substantial non-recurring engineering (NRE) costs, which limit the proliferation of customized designs. This work presents tools and methodologies for the development of custom systems-on-chip (SoCs) for numerical data analysis applications. An integrated generator-based framework for SoC development is demonstrated through SoC customization and hardware/software co-design for numerical data analysis and machine-learning applications. The development of full-system support from hardware accelerators through system software leads to the identification of several co-design opportunities for increasing accelerator utility in custom SoCs. Specifically, we demonstrate the development of high-performance custom software library implementations to support accelerated numerical data analysis on custom SoCs developed using the Chipyard integrated generator-based framework for custom SoC design. We further identify a need to provide support for processing of a high variety of matrix shapes and sizes in SoC deep learning accelerator matrix engines for accelerated processing of numerical data analysis workloads, and demonstrate up to a 1.25x improvement in the utilization of a matrix engine on small and rectangular matrices through hardware-managed static scheduling, dynamic scheduling, and hardware-managed commutative micro-threading.
Advisors: Borivoje Nikolic and Krste Asanović
BibTeX citation:
@phdthesis{Amid:EECS-2022-247, Author= {Amid, Alon}, Title= {Generator-Based Design of Custom Systems-on-Chip for Numerical Data Analysis}, School= {EECS Department, University of California, Berkeley}, Year= {2022}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-247.html}, Number= {UCB/EECS-2022-247}, Abstract= {With the end of Dennard scaling and the subsequent demise of Moore's Law, the continuous demand for higher computing performance and efficiency is increasingly met through specialization of digital processors. In particular, numerical data processing and machine-learning applications incur high computational costs but often have common computational structures, acting as prime targets for hardware customization. Specialization of digital designs is accompanied by substantial non-recurring engineering (NRE) costs, which limit the proliferation of customized designs. This work presents tools and methodologies for the development of custom systems-on-chip (SoCs) for numerical data analysis applications. An integrated generator-based framework for SoC development is demonstrated through SoC customization and hardware/software co-design for numerical data analysis and machine-learning applications. The development of full-system support from hardware accelerators through system software leads to the identification of several co-design opportunities for increasing accelerator utility in custom SoCs. Specifically, we demonstrate the development of high-performance custom software library implementations to support accelerated numerical data analysis on custom SoCs developed using the Chipyard integrated generator-based framework for custom SoC design. We further identify a need to provide support for processing of a high variety of matrix shapes and sizes in SoC deep learning accelerator matrix engines for accelerated processing of numerical data analysis workloads, and demonstrate up to a 1.25x improvement in the utilization of a matrix engine on small and rectangular matrices through hardware-managed static scheduling, dynamic scheduling, and hardware-managed commutative micro-threading.}, }
EndNote citation:
%0 Thesis %A Amid, Alon %T Generator-Based Design of Custom Systems-on-Chip for Numerical Data Analysis %I EECS Department, University of California, Berkeley %D 2022 %8 December 1 %@ UCB/EECS-2022-247 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-247.html %F Amid:EECS-2022-247