The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V
Christopher Celio and Daniel Dabbelt and David A. Patterson and Krste Asanović
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2016-130
July 8, 2016
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-130.pdf
This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the simplicity and cost-effectiveness that underpins the original RISC goals.
We begin by comparing the dynamic instruction counts and dynamic instruction bytes fetched for the popular proprietary ARMv7, ARMv8, IA-32, and x86-64 Instruction Set Architectures (ISAs) against the free and open RISC-V RV64G and RV64GC ISAs when running the SPEC CINT2006 benchmark suite. RISC-V was designed as a very small ISA to support a wide range of implementations, and has a less mature compiler toolchain. However, we observe that on SPEC CINT2006 RV64G executes on average 16% more instructions than x86-64, 3% more instructions than IA-32, 9% more instructions than ARMv8, but 4% fewer instructions than ARMv7.
CISC x86 implementations break up complex instructions into smaller internal RISC-like micro-ops, and the RV64G instruction count is within 2% of the x86-64 retired micro-op count. RV64GC, the compressed variant of RV64G, is the densest ISA studied, fetching 8% fewer dynamic instruction bytes than x86-64. We observed that much of the increased RISC-V instruction count is due to a small set of common multi-instruction idioms.
Exploiting this fact, the RV64G and RV64GC effective instruction count can be reduced by 5.4% on average by leveraging macro-op fusion. Combining the compressed RISC-V ISA extension with macro-op fusion provides both the densest ISA and the fewest dynamic operations retired per program, reducing the motivation to add more instructions to the ISA. This approach retains a single simple ISA suitable for both low-end and high-end implementations, where high-end implementations can boost performance through microarchitectural techniques.
BibTeX citation:
@techreport{Celio:EECS-2016-130, Author= {Celio, Christopher and Dabbelt, Daniel and Patterson, David A. and Asanović, Krste}, Title= {The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V}, Year= {2016}, Month= {Jul}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-130.html}, Number= {UCB/EECS-2016-130}, Abstract= {This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the simplicity and cost-effectiveness that underpins the original RISC goals. We begin by comparing the dynamic instruction counts and dynamic instruction bytes fetched for the popular proprietary ARMv7, ARMv8, IA-32, and x86-64 Instruction Set Architectures (ISAs) against the free and open RISC-V RV64G and RV64GC ISAs when running the SPEC CINT2006 benchmark suite. RISC-V was designed as a very small ISA to support a wide range of implementations, and has a less mature compiler toolchain. However, we observe that on SPEC CINT2006 RV64G executes on average 16% more instructions than x86-64, 3% more instructions than IA-32, 9% more instructions than ARMv8, but 4% fewer instructions than ARMv7. CISC x86 implementations break up complex instructions into smaller internal RISC-like micro-ops, and the RV64G instruction count is within 2% of the x86-64 retired micro-op count. RV64GC, the compressed variant of RV64G, is the densest ISA studied, fetching 8% fewer dynamic instruction bytes than x86-64. We observed that much of the increased RISC-V instruction count is due to a small set of common multi-instruction idioms. Exploiting this fact, the RV64G and RV64GC effective instruction count can be reduced by 5.4% on average by leveraging macro-op fusion. Combining the compressed RISC-V ISA extension with macro-op fusion provides both the densest ISA and the fewest dynamic operations retired per program, reducing the motivation to add more instructions to the ISA. This approach retains a single simple ISA suitable for both low-end and high-end implementations, where high-end implementations can boost performance through microarchitectural techniques.}, }
EndNote citation:
%0 Report %A Celio, Christopher %A Dabbelt, Daniel %A Patterson, David A. %A Asanović, Krste %T The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V %I EECS Department, University of California, Berkeley %D 2016 %8 July 8 %@ UCB/EECS-2016-130 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-130.html %F Celio:EECS-2016-130