Code Optimizers and Register Organizations for Vector Architectures

Corinna Grace Lee

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-92-686
May 1992

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1992/CSD-92-686.pdf

A major challenge facing computer architects today is designing cost-effective hardware that executes multiple operations simultaneously. The goal of such designs is to improve performance by taking advantage of fine-grain parallelism. In this dissertation, I study vector architectures, the oldest of several processor designs that support fine-grain parallelism. Because implementing a cost-effective processor that performs well requires studying not only the design of processors but also the design of algorithms for compilers, this dissertation encompasses aspects of both hardware and software design.

In the first half of this dissertation, I demonstrate that a vector architecture is a cost-effective processor that supports fine-grain parallelism. I show that implementing a vector architecture is no more costly than implementing a superscalar architecture, which is currently popular among designers of VLSI microprocessors. I then show that programs that are rich in parallelism tend also to be vectorizable and are also the ones that execute the longest in a workload, thus demonstrating further the effectiveness of vector architectures. Finally, I show that superpipelined hardware in combination with a vector architecture can take advantage of what little parallelism is available in non-vectorizable programs.

In the second half of this dissertation, I investigate the cost and performance of different organizations for a vector register file in the Cray Y-MP vector processor, an investigation that emphasizes the interaction between processor design and compiler algorithms. After showing that instruction scheduling has a major impact on how effectively more vector registers can be used, I present data from simulation experiments indicating that 16 vector registers and a list scheduling algorithm can improve performance significantly over that of 8 vector registers and the scheduling algorithm used in the Cray vectorizing compiler. I also investigate the usage of an alternative register organization, called a partitioned vector register file, which is less costly to implement than a traditional one but places some restrictions on accessing vector registers. To circumvent this restructive access, I develop an algorithm for assigning vector registers and present data showing that, when using my algorithm, the performance of a partitioned vector register file is comparable to that of a traditional one.

Advisor: David A. Patterson


BibTeX citation:

@phdthesis{Lee:CSD-92-686,
    Author = {Lee, Corinna Grace},
    Title = {Code Optimizers and Register Organizations for Vector Architectures},
    School = {EECS Department, University of California, Berkeley},
    Year = {1992},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1992/6272.html},
    Number = {UCB/CSD-92-686},
    Abstract = {A major challenge facing computer architects today is designing cost-effective hardware that executes multiple operations simultaneously. The goal of such designs is to improve performance by taking advantage of fine-grain parallelism. In this dissertation, I study vector architectures, the oldest of several processor designs that support fine-grain parallelism. Because implementing a cost-effective processor that performs well requires studying not only the design of processors but also the design of algorithms for compilers, this dissertation encompasses aspects of both hardware and software design. <p>In the first half of this dissertation, I demonstrate that a vector architecture is a cost-effective processor that supports fine-grain parallelism. I show that implementing a vector architecture is no more costly than implementing a superscalar architecture, which is currently popular among designers of VLSI microprocessors. I then show that programs that are rich in parallelism tend also to be vectorizable and are also the ones that execute the longest in a workload, thus demonstrating further the effectiveness of vector architectures. Finally, I show that superpipelined hardware in combination with a vector architecture can take advantage of what little parallelism is available in non-vectorizable programs. <p>In the second half of this dissertation, I investigate the cost and performance of different organizations for a vector register file in the Cray Y-MP vector processor, an investigation that emphasizes the interaction between processor design and compiler algorithms. After showing that instruction scheduling has a major impact on how effectively more vector registers can be used, I present data from simulation experiments indicating that 16 vector registers and a list scheduling algorithm can improve performance significantly over that of 8 vector registers and the scheduling algorithm used in the Cray vectorizing compiler. I also investigate the usage of an alternative register organization, called a partitioned vector register file, which is less costly to implement than a traditional one but places some restrictions on accessing vector registers. To circumvent this restructive access, I develop an algorithm for assigning vector registers and present data showing that, when using my algorithm, the performance of a partitioned vector register file is comparable to that of a traditional one.}
}

EndNote citation:

%0 Thesis
%A Lee, Corinna Grace
%T Code Optimizers and Register Organizations for Vector Architectures
%I EECS Department, University of California, Berkeley
%D 1992
%@ UCB/CSD-92-686
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1992/6272.html
%F Lee:CSD-92-686