Coprocessor Architectures for VLSI

Paul Mark Hansen

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-88-466
November 1988

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/CSD-88-466.pdf

The hardware resources available on a single chip to implement VLSI CPU remain scarce, despite rapid technological advances. Reduced Instruction Set Computers (RISCs) reduce complexity and use the chip hardware resources to make the most frequently occurring operations fast. In this dissertation, the RISC philosophy is extended to specialized devices called coprocessors. Coprocessors increase system performance by reducing the number of instructions per program and the number of effective cycles per instruction.

A method for evaluating coprocessor performance is developed, including a model that accounts for system, software, and hardware effects. Coprocessor implementations are characterized in terms of effectiveness and utilization by considering operation and overhead time for typical computations.

Performance and interface characteristics of the SPUR floating-point coprocessor implementation of the IEEE Standard are presented and compared to two popular commercial versions by Intel and Motorola. The SPUR FPU is a factor of three to 50 times faster than the commercial versions for comparable technology and clock rates. For each architecture, the influence on performance of each of the following is identified: the bus width between the floating-point unit and operand storage, the operand transfer protocols implemented in hardware, the concurrent execution model, the speed of the function units, the floating-point instruction semantics, and the data cache service time. Execution time spent in overhead is shown to increase to more than 90% for some architectures if equipped with faster floating-point units. This suggests that coprocessor interface architectures must change dramatically to keep pace with the rapid advance in CPU execution rates to be effective.

The combinatorial optimization problem of finding the shortest path between two vertices in a directed graph is presented. Algorithms for scan-based relaxation techniques and Dijkstra's shortest-path algorithm are considered in detail. A path optimization coprocessor based on the SPUR model is proposed that achieves nearly three orders of magnitude improvement in performance over software implementations, and two to three orders of magnitude improvement in cost with performance comparable to dedicated hardware devices or specialized and multi-computer architectures.

Finally, the SPUR coprocessor architecture is evaluated for three other applications: digital signal processing, vector floating-floating arithmetic, and support for the Prolog language.

Advisor: David A. Patterson


BibTeX citation:

@phdthesis{Hansen:CSD-88-466,
    Author = {Hansen, Paul Mark},
    Title = {Coprocessor Architectures for VLSI},
    School = {EECS Department, University of California, Berkeley},
    Year = {1988},
    Month = {Nov},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/6067.html},
    Number = {UCB/CSD-88-466},
    Abstract = {The hardware resources available on a single chip to implement VLSI CPU remain scarce, despite rapid technological advances. Reduced Instruction Set Computers (RISCs) reduce complexity and use the chip hardware resources to make the most frequently occurring operations fast. In this dissertation, the RISC philosophy is extended to specialized devices called coprocessors.  Coprocessors increase system performance by reducing the number of instructions per program and the number of effective cycles per instruction. <p>A method for evaluating coprocessor performance is developed, including a model that accounts for system, software, and hardware effects. Coprocessor implementations are characterized in terms of effectiveness and utilization by considering operation and overhead time for typical computations. <p>Performance and interface characteristics of the SPUR floating-point coprocessor implementation of the IEEE Standard are presented and compared to two popular commercial versions by Intel and Motorola. The SPUR FPU is a factor of three to 50 times faster than the commercial versions for comparable technology and clock rates. For each architecture, the influence on performance of each of the following is identified: the bus width between the floating-point unit and operand storage, the operand transfer protocols implemented in hardware, the concurrent execution model, the speed of the function units, the floating-point instruction semantics, and the data cache service time.  Execution time spent in overhead is shown to increase to more than 90% for some architectures if equipped with faster floating-point units. This suggests that coprocessor interface architectures must change dramatically to keep pace with the rapid advance in CPU execution rates to be effective. <p>The combinatorial optimization problem of finding the shortest path between two vertices in a directed graph is presented. Algorithms for scan-based relaxation techniques and Dijkstra's shortest-path algorithm are considered in detail. A path optimization coprocessor based on the SPUR model is proposed that achieves nearly three orders of magnitude improvement in performance over software implementations, and two to three orders of magnitude improvement in cost with performance comparable to dedicated hardware devices or specialized and multi-computer architectures. <p>Finally, the SPUR coprocessor architecture is evaluated for three other applications: digital signal processing, vector floating-floating arithmetic, and support for the Prolog language.}
}

EndNote citation:

%0 Thesis
%A Hansen, Paul Mark
%T Coprocessor Architectures for VLSI
%I EECS Department, University of California, Berkeley
%D 1988
%@ UCB/CSD-88-466
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/6067.html
%F Hansen:CSD-88-466