Mixed Precision Vector Processors

Albert Ou

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2015-265

December 19, 2015

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-265.pdf

Mixed-precision computation presents opportunities for programmable accelerators to improve performance and energy efficiency while retaining application flexibility. Building on the Hwacha decoupled vector-fetch accelerator, we introduce high-occupancy vector lanes (HOV), a set of mixed-precision hardware optimizations which support dynamic configuration of multiple architectural register widths and high-throughput operations on packed data. We discuss the implications of HOV for the programming model and describe our microarchitectural approach to maximizing register file utilization and datapath parallelism. Using complete VLSI implementations of HOV in a commercial 28nm process technology, featuring a cache-coherent memory hierarchy with L2 caches and simulated LPDDR3 DRAM modules, we quantify the impact of our HOV enhancements on area, performance, and energy consumption compared to the baseline design, a decoupled vector architecture without mixed-precision support. We observe as much as a 64.3% performance gain and a 61.6% energy reduction over the baseline vector machine on half-precision dense matrix multiplication. We then validate the HOV design against the ARM Mali-T628 MP6 GPU by running a suite of microbenchmarks compiled from the same OpenCL source code using our custom HOV-enabled compiler and the ARM stock compiler.

Advisors: Krste Asanović

BibTeX citation:

@mastersthesis{Ou:EECS-2015-265,
    Author= {Ou, Albert},
    Editor= {Asanović, Krste and Stojanovic, Vladimir},
    Title= {Mixed Precision Vector Processors},
    School= {EECS Department, University of California, Berkeley},
    Year= {2015},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-265.html},
    Number= {UCB/EECS-2015-265},
    Abstract= {Mixed-precision computation presents opportunities for programmable accelerators to improve performance and energy efficiency while retaining application flexibility.  Building on the Hwacha decoupled vector-fetch accelerator, we introduce high-occupancy vector lanes (HOV), a set of mixed-precision hardware optimizations which support dynamic configuration of multiple architectural register widths and high-throughput operations on packed data.  We discuss the implications of HOV for the programming model and describe our microarchitectural approach to maximizing register file utilization and datapath parallelism.  Using complete VLSI implementations of HOV in a commercial 28nm  process technology, featuring a cache-coherent memory hierarchy with L2 caches and simulated LPDDR3 DRAM modules, we quantify the impact of our HOV enhancements on area, performance, and energy consumption compared to the baseline design, a decoupled vector architecture without mixed-precision support.  We observe as much as a 64.3% performance gain and a 61.6% energy reduction over the baseline vector machine on half-precision dense matrix multiplication.  We then validate the HOV design against the ARM Mali-T628 MP6 GPU by running a suite of microbenchmarks compiled from the same OpenCL source code using our custom HOV-enabled compiler and the ARM stock compiler.},
}

EndNote citation:

%0 Thesis
%A Ou, Albert 
%E Asanović, Krste 
%E Stojanovic, Vladimir 
%T Mixed Precision Vector Processors
%I EECS Department, University of California, Berkeley
%D 2015
%8 December 19
%@ UCB/EECS-2015-265
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-265.html
%F Ou:EECS-2015-265