### Jason Riedy

###
EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2010-172

December 17, 2010

### http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-172.pdf

Solving square linear systems of equations
*Ax=b* is one of the primary workhorses in scientific computing. With asymptotically and practically small amounts of extra calculation and higher precision, we can render solution techniques \emph{dependable}. We produce a solution with tiny error for almost all systems where we should expect a tiny error, and we correctly flag potential failures.

Our method uses a proven technique: iterative refinement. We extend prior work by applying extra precision not only in calculating the residual *b-A\y _{i}* of an intermediate solution

*\y*but also in carrying that intermediate solution

_{i}*\y*. Analysis shows that extra precision in the intermediate solutions lowers the limiting backward error (measuring perturbations in the initial problem) to levels that produce a forward error (measuring perturbations in the solution) not much larger than the precision used to store the result. We also demonstrate that condition estimation is not necessary for determining success, reducing the computation in refinement substantially.

_{i}This basic, dependable solver applies to typical dense *LU* factorization methods using partial pivoting as well as methods that risk greater failure by choosing pivots for non-numerical reasons. Sparse factorization methods may choose pivots to promote structural sparsity or even choose pivots \emph{before} factorization to decouple the phases. We show through experiments that solutions using these restrictive pivoting methods still have small error so long as an estimate of factorization quality, the growth factor, does not grow too large. Our refinement algorithm dependably flags such failures. Additionally, we find a better choice of heuristic for sparse static pivoting than the defaults in Li and Demmel's SuperLU package.

Static pivoting in a distributed-memory setting needs an algorithm for choosing pivots that does not rely on fitting the entire matrix into one memory space. We investigate a set of algorithms, Bertsekas's auction algorithms, for choosing a static pivoting via maximum weight perfect bipartite matching. Auction algorithms have a natural mapping to distributed memory computation through their bidding mechanism. We provide an analysis of the auction algorithm fitting it comfortably in linear optimization theory and characterizing approximately maximum weight perfect bipartite matches. These approximately maximum weight perfect matches work well as static pivot choices and can be computed much more quickly than the exact maximum weight matching.

Finally, we consider the performance of auction algorithm implementations on a suite of real-world sparse problems. Sequential performance is roughly equivalent to existing implementations like Duff and Koster's MC64, but varies widely with different parameter and input settings. The parallel performance is even more wildly unpredictable. Computing approximately maximum weight matchings helps performance somewhat, but we still conclude that the performance is too variable for a black-box solution method.

**Advisor:** James Demmel

BibTeX citation:

@phdthesis{Riedy:EECS-2010-172, Author = {Riedy, Jason}, Title = {Making Static Pivoting Scalable and Dependable}, School = {EECS Department, University of California, Berkeley}, Year = {2010}, Month = {Dec}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-172.html}, Number = {UCB/EECS-2010-172}, Abstract = {Solving square linear systems of equations <i>Ax=b</i> is one of the primary workhorses in scientific computing. With asymptotically and practically small amounts of extra calculation and higher precision, we can render solution techniques \emph{dependable}. We produce a solution with tiny error for almost all systems where we should expect a tiny error, and we correctly flag potential failures. Our method uses a proven technique: iterative refinement. We extend prior work by applying extra precision not only in calculating the residual <i>b-A\y<sub>i</sub></i> of an intermediate solution <i>\y<sub>i</sub></i> but also in carrying that intermediate solution <i>\y<sub>i</sub></i>. Analysis shows that extra precision in the intermediate solutions lowers the limiting backward error (measuring perturbations in the initial problem) to levels that produce a forward error (measuring perturbations in the solution) not much larger than the precision used to store the result. We also demonstrate that condition estimation is not necessary for determining success, reducing the computation in refinement substantially. This basic, dependable solver applies to typical dense <i>LU</i> factorization methods using partial pivoting as well as methods that risk greater failure by choosing pivots for non-numerical reasons. Sparse factorization methods may choose pivots to promote structural sparsity or even choose pivots \emph{before} factorization to decouple the phases. We show through experiments that solutions using these restrictive pivoting methods still have small error so long as an estimate of factorization quality, the growth factor, does not grow too large. Our refinement algorithm dependably flags such failures. Additionally, we find a better choice of heuristic for sparse static pivoting than the defaults in Li and Demmel's SuperLU package. Static pivoting in a distributed-memory setting needs an algorithm for choosing pivots that does not rely on fitting the entire matrix into one memory space. We investigate a set of algorithms, Bertsekas's auction algorithms, for choosing a static pivoting via maximum weight perfect bipartite matching. Auction algorithms have a natural mapping to distributed memory computation through their bidding mechanism. We provide an analysis of the auction algorithm fitting it comfortably in linear optimization theory and characterizing approximately maximum weight perfect bipartite matches. These approximately maximum weight perfect matches work well as static pivot choices and can be computed much more quickly than the exact maximum weight matching. Finally, we consider the performance of auction algorithm implementations on a suite of real-world sparse problems. Sequential performance is roughly equivalent to existing implementations like Duff and Koster's MC64, but varies widely with different parameter and input settings. The parallel performance is even more wildly unpredictable. Computing approximately maximum weight matchings helps performance somewhat, but we still conclude that the performance is too variable for a black-box solution method.} }

EndNote citation:

%0 Thesis %A Riedy, Jason %T Making Static Pivoting Scalable and Dependable %I EECS Department, University of California, Berkeley %D 2010 %8 December 17 %@ UCB/EECS-2010-172 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-172.html %F Riedy:EECS-2010-172