Vasily Volkov and James Demmel

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2007-179

December 29, 2007

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-179.pdf

Graphical Processing Units (GPUs) potentially promise widespread and inexpensive high performance computation. However, architectural limitations (only some operations and memory access patterns can be performed quickly, partial support for IEEE floating point arithmetic) make it necessary to change existing algorithms to attain high performance and correctness. Here we show how to make the bisection algorithm for eigenvalues of symmetric tridiagonal matrices (sstebz from LAPACK) run both fast and correctly on an ATI Radeon X1900 GPU. Our fastest algorithm takes up to 156x less time than Intel's Math Kernel Library version of sstebz running on the CPU, but does so by doing many redundant floating point operations compared to the CPU version. We use an automatic tuning procedure analogous to ATLAS or PHiPAC to decide the optimal redundancy. Correctness despite partial IEEE floating point semantics required explicitly adding 0 in the inner loop. The problems and solutions discussed here are of interest on other GPU architectures.


BibTeX citation:

@techreport{Volkov:EECS-2007-179,
    Author= {Volkov, Vasily and Demmel, James},
    Title= {Using GPUs to Accelerate the Bisection Algorithm for Finding Eigenvalues of Symmetric Tridiagonal Matrices},
    Year= {2007},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-179.html},
    Number= {UCB/EECS-2007-179},
    Abstract= {Graphical Processing Units (GPUs) potentially promise widespread and inexpensive high performance computation. However, architectural limitations (only some operations and memory access patterns can be performed quickly, partial support for IEEE floating point arithmetic) make it necessary to change existing algorithms to attain high performance and correctness. Here we show how to make the bisection algorithm for eigenvalues of symmetric tridiagonal matrices (sstebz from LAPACK) run both fast and correctly on an ATI Radeon X1900 GPU. Our fastest algorithm takes up to 156x less time than Intel's Math Kernel Library version of sstebz running on the CPU, but does so by doing many redundant floating point operations compared to the CPU version. We use an automatic tuning procedure analogous to ATLAS or PHiPAC to decide the optimal redundancy. Correctness despite partial IEEE floating point semantics required explicitly adding 0 in the inner loop. The problems and solutions discussed here are of interest on other GPU architectures.},
}

EndNote citation:

%0 Report
%A Volkov, Vasily 
%A Demmel, James 
%T Using GPUs to Accelerate the Bisection Algorithm for Finding Eigenvalues of Symmetric Tridiagonal Matrices
%I EECS Department, University of California, Berkeley
%D 2007
%8 December 29
%@ UCB/EECS-2007-179
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-179.html
%F Volkov:EECS-2007-179