Low-complexity Vector Microprocessor Extensions

Joseph James Gebis

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2008-47

May 6, 2008

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-47.pdf

For the last few years, single-thread performance has been improving at a snail's pace. Power limitations, increasing relative memory latency, and the exhaustion of improvement in instruction-level parallelism are forcing microprocessor architects to examine new processor design strategies. In this dissertation, I take a look at a technology that can improve the efficiency of modern microprocessors: vectors. Vectors are a simple, power-efficient way to take advantage of common data-level parallelism in an extensible, easily-programmable manner. My work focuses on the process of transitioning from traditional scalar microprocessors to computers that can take advantage of vectors.

First, I describe a process for extending existing single-instruction, multiple-data instruction sets to support full vector processing, in a way that remains binary compatible with existing applications. Initial implementations can be low cost, but be transparently extended to higher performance later.

I also describe ViVA, the Virtual Vector Architecture. ViVA adds vector-style memory operations to existing microprocessors but does not include arithmetic datapaths; instead, memory instructions work with a new buffer placed between the core and second-level cache. ViVA serves as a low-cost solution to getting much of the performance of full vector memory hierarchies while avoiding the complexity of adding a full vector system.

Finally, I test the performance of ViVA by modifying a cycle-accurate full-system simulator to support ViVA's operation. After extensive calibration, I test the basic performance of ViVA using a series of microbenchmarks. I compare the performance of a variety of ViVA configurations for corner turn, used in processing multidimensional data, and sparse matrix-vector multiplication, used in many scientific applications. Results show that ViVA can give significant benefit for a variety of memory access patterns, without relying on a costly hardware prefetcher.

Advisors: David A. Patterson

BibTeX citation:

@phdthesis{Gebis:EECS-2008-47,
    Author= {Gebis, Joseph James},
    Title= {Low-complexity Vector Microprocessor Extensions},
    School= {EECS Department, University of California, Berkeley},
    Year= {2008},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-47.html},
    Number= {UCB/EECS-2008-47},
    Abstract= {For the last few years, single-thread performance has been improving at a
snail's pace.  Power limitations, increasing relative memory latency, and
the exhaustion of improvement in instruction-level parallelism are
forcing microprocessor architects to examine new processor design
strategies.  In this dissertation, I take a look at a technology that can
improve the efficiency of modern microprocessors: vectors.  Vectors are a
simple, power-efficient way to take advantage of common data-level
parallelism in an extensible, easily-programmable manner.  My work
focuses on the process of transitioning from traditional scalar
microprocessors to computers that can take advantage of vectors.

First, I describe a process for extending existing single-instruction,
multiple-data instruction sets to support full vector processing, in a
way that remains binary compatible with existing applications.  Initial
implementations can be low cost, but be transparently extended to higher
performance later.

I also describe ViVA, the Virtual Vector Architecture.  ViVA adds
vector-style memory operations to existing microprocessors but does not
include arithmetic datapaths; instead, memory instructions work with a
new buffer placed between the core and second-level cache.  ViVA serves
as a low-cost solution to getting much of the performance of full vector
memory hierarchies while avoiding the complexity of adding a full vector
system.

Finally, I test the performance of ViVA by modifying a cycle-accurate
full-system simulator to support ViVA's operation.  After extensive
calibration, I test the basic performance of ViVA using a series of
microbenchmarks.  I compare the performance of a variety of ViVA
configurations for corner turn, used in processing multidimensional data,
and sparse matrix-vector multiplication, used in many scientific
applications.  Results show that ViVA can give significant benefit for a
variety of memory access patterns, without relying on a costly hardware
prefetcher.},
}

EndNote citation:

%0 Thesis
%A Gebis, Joseph James 
%T Low-complexity Vector Microprocessor Extensions
%I EECS Department, University of California, Berkeley
%D 2008
%8 May 6
%@ UCB/EECS-2008-47
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-47.html
%F Gebis:EECS-2008-47