Processor Design Tradeoffs in VLSI

Robert Warren Sherburne, Jr.

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-84-173
April 1984

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1984/CSD-84-173.pdf

As the density of circuit integration is increased, management of complexity becomes a critical issue in chip design. Hundreds of man-years of design time are required for complex processors which are presently available on a few chips. This high cost of manpower and other resources is not acceptable. In order to address this problem, the Reduced Instruction Set Computer (RISC) architecture relies on a small set of simple instructions which execute in a regular manner. This allows a powerful processor to be implemented on a single chip at a cost of only a few man years. A critical factor be- hind the success of the RISC II microprocessor is the careful optimization on a single chip at a cost of only a few man years A critical factor behind the success of the RISC II microprocessor is the careful optimization which was performed during its design. Allocation of the limited chip area and power resources must be is the careful optimization which was performed during its design. Allocation of the limited chip area and power resources must be carefully performed to ensure that all processor instructions operate at the fastest possible speed. A fast implementation alone, however, is not sufficient; the designer must also overall performance for typical applications in order to ensure best results. Areas of processor design which are analyzed in this work include System Pipelining, Local Memory Tradeoffs, Datapath Timing, and ALU Design Tradeoffs. Pipelining improves performance by increasing the utilization of the datapath resources. This gain is diminished, however, by data and instruction dependencies which require extra cycles of delay during instruction execution. Also, the larger register file bitcells which are needed in order to support concurrency in the datapath incur greater delays and reduce system bandwidth from the expected value. Increased local memory (or register file) capacity significantly reduces data I/O traffic by keeping needed data frequently in registers on the chip. Too much local memory, though, can actually reduce system throughput by increasing the datapath cycle time. Various ALU organizations are available to the designer; here several approaches are investigated as to their suitability for VLSI. Carry delay as well as power, area, and regularity issues are examined for ripple, carry-select, and parallel adder designs. First, a traditional, fixed-gate delay analysis of carry computation is performed over a range of adder sizes. Next, delays are measured for NMOS implementations utilizing dynamic logic and bootstrapping techniques. The results differ widely: the fixed-delay model shows the parallel design to be superior for adders of 16 bits and up, while the NMOS analysis showed it to be outperformed by the carry-select design through 128 bits. Such a result underscores the need to reevaluate design strategies which were traditionally chosen for TTL-based implementations. Single-chip VLSI implementations impose a whole new set of It is hoped that this work will bring out the significance of evaluating the design tradeoffs over the whole spectrum ranging from the selection of a processor architecture down to the choice of the carry circuitry in the ALU. In this research I was supported for three years by a General Electric doctoral fellowship. The RISC project was supported in part by ARPA Order No. 3803 and monitored by NESC #N00039-78-G-0013-0004.

Advisor: Carlo H. Séquin


BibTeX citation:

@phdthesis{Sherburne, Jr.:CSD-84-173,
    Author = {Sherburne, Jr., Robert Warren},
    Title = {Processor Design Tradeoffs in VLSI},
    School = {EECS Department, University of California, Berkeley},
    Year = {1984},
    Month = {Apr},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1984/5965.html},
    Number = {UCB/CSD-84-173},
    Abstract = {As the density of circuit integration is increased, management of complexity becomes a critical issue in chip design. Hundreds of man-years of design time are required for complex processors which are presently available on a few chips. This high cost of manpower and other resources is not acceptable. In order to address this problem, the Reduced Instruction Set Computer (RISC) architecture relies on a small set of simple instructions which execute in a regular manner. This allows a powerful processor to be implemented on a single chip at a cost of only a few man years. A critical factor be- hind the success of the RISC II microprocessor is the careful optimization on a single chip at a cost of only a few man years A critical factor behind the success of the RISC II microprocessor is the careful optimization which was performed during its design.  Allocation of the limited chip area and power resources must be is the careful optimization which was performed during its design.  Allocation of the limited chip area and power resources must be carefully performed to ensure that all processor instructions operate at the fastest possible speed. A fast implementation alone, however, is not sufficient; the designer must also overall performance for typical applications in order to ensure best results. Areas of processor design which are analyzed in this work include System Pipelining, Local Memory Tradeoffs, Datapath Timing, and ALU Design Tradeoffs. Pipelining improves performance by increasing the utilization of the datapath resources. This gain is diminished, however, by data and instruction dependencies which require extra cycles of delay during instruction execution. Also, the larger register file bitcells which are needed in order to support concurrency in the datapath incur greater delays and reduce system bandwidth from the expected value. Increased local memory (or register file) capacity significantly reduces data I/O traffic by keeping needed data frequently in registers on the chip. Too much local memory, though, can actually reduce system throughput by increasing the datapath cycle time. Various ALU organizations are available to the designer; here several approaches are investigated as to their suitability for VLSI. Carry delay as well as power, area, and regularity issues are examined for ripple, carry-select, and parallel adder designs. First, a traditional, fixed-gate delay analysis of carry computation is performed over a range of adder sizes. Next, delays are measured for NMOS implementations utilizing dynamic logic and bootstrapping techniques. The results differ widely:  the fixed-delay model shows the parallel design to be superior for adders of 16 bits and up, while the NMOS analysis showed it to be outperformed by the carry-select design through 128 bits.  Such a result underscores the need to reevaluate design strategies which were traditionally chosen for TTL-based implementations.  Single-chip VLSI implementations impose a whole new set of It is hoped that this work will bring out the significance of evaluating the design tradeoffs over the whole spectrum ranging from the selection of a processor architecture down to the choice of the carry circuitry in the ALU. In this research I was supported for three years by a General Electric doctoral fellowship. The RISC project was supported in part by ARPA Order No. 3803 and monitored by NESC #N00039-78-G-0013-0004.}
}

EndNote citation:

%0 Thesis
%A Sherburne, Jr., Robert Warren
%T Processor Design Tradeoffs in VLSI
%I EECS Department, University of California, Berkeley
%D 1984
%@ UCB/CSD-84-173
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1984/5965.html
%F Sherburne, Jr.:CSD-84-173