The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V

Christopher Celio, Daniel Dabbelt, David A. Patterson and Krste Asanović

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2016-130
July 8, 2016

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-130.pdf

This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the simplicity and cost-effectiveness that underpins the original RISC goals.

We begin by comparing the dynamic instruction counts and dynamic instruction bytes fetched for the popular proprietary ARMv7, ARMv8, IA-32, and x86-64 Instruction Set Architectures (ISAs) against the free and open RISC-V RV64G and RV64GC ISAs when running the SPEC CINT2006 benchmark suite. RISC-V was designed as a very small ISA to support a wide range of implementations, and has a less mature compiler toolchain. However, we observe that on SPEC CINT2006 RV64G executes on average 16% more instructions than x86-64, 3% more instructions than IA-32, 9% more instructions than ARMv8, but 4% fewer instructions than ARMv7.

CISC x86 implementations break up complex instructions into smaller internal RISC-like micro-ops, and the RV64G instruction count is within 2% of the x86-64 retired micro-op count. RV64GC, the compressed variant of RV64G, is the densest ISA studied, fetching 8% fewer dynamic instruction bytes than x86-64. We observed that much of the increased RISC-V instruction count is due to a small set of common multi-instruction idioms.

Exploiting this fact, the RV64G and RV64GC effective instruction count can be reduced by 5.4% on average by leveraging macro-op fusion. Combining the compressed RISC-V ISA extension with macro-op fusion provides both the densest ISA and the fewest dynamic operations retired per program, reducing the motivation to add more instructions to the ISA. This approach retains a single simple ISA suitable for both low-end and high-end implementations, where high-end implementations can boost performance through microarchitectural techniques.


BibTeX citation:

@techreport{Celio:EECS-2016-130,
    Author = {Celio, Christopher and Dabbelt, Daniel and Patterson, David A. and Asanović, Krste},
    Title = {The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2016},
    Month = {Jul},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-130.html},
    Number = {UCB/EECS-2016-130},
    Abstract = {This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the simplicity and cost-effectiveness that underpins the original RISC goals.

We begin by comparing the dynamic instruction counts and dynamic instruction bytes fetched for the popular proprietary ARMv7, ARMv8, IA-32, and x86-64 Instruction Set Architectures (ISAs) against the free and open RISC-V RV64G and RV64GC ISAs when running the SPEC CINT2006 benchmark suite. RISC-V was designed as a very small ISA to support a wide range of implementations, and has a less mature compiler toolchain. However, we observe that on SPEC CINT2006 RV64G executes on average 16% more instructions than x86-64, 3% more instructions than IA-32, 9% more instructions than ARMv8, but 4% fewer instructions than ARMv7.

CISC x86 implementations break up complex instructions into smaller internal RISC-like micro-ops, and the RV64G instruction count is within 2% of the x86-64 retired micro-op count. RV64GC, the compressed variant of RV64G, is the densest ISA studied, fetching 8% fewer dynamic instruction bytes than x86-64. We observed that much of the increased RISC-V instruction count is due to a small set of common multi-instruction idioms.

Exploiting this fact, the RV64G and RV64GC effective instruction count can be reduced by 5.4% on average by leveraging macro-op fusion. Combining the compressed RISC-V ISA extension with macro-op fusion provides both the densest ISA and the fewest dynamic operations retired per program, reducing the motivation to add more instructions to the ISA. This approach retains a single simple ISA suitable for both low-end and high-end implementations, where high-end implementations can boost performance through microarchitectural techniques.}
}

EndNote citation:

%0 Report
%A Celio, Christopher
%A Dabbelt, Daniel
%A Patterson, David A.
%A Asanović, Krste
%T The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V
%I EECS Department, University of California, Berkeley
%D 2016
%8 July 8
%@ UCB/EECS-2016-130
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-130.html
%F Celio:EECS-2016-130