Randomized Numerical Linear Algebra: A Perspective on the Field With an Eye to Software (EECS-2023-19)
Riley Murray, James Demmel, Michael W. Mahoney, N. Benjamin Erichson, Maksim Melnichenko, Osman Asif Malik, Laura Grigori, Piotr Luszczek, Michal Derezinski, Miles E. Lopes, Tianyu Liang, Hengrui Luo and Jack Dongarra

High Efficiency Computation of Game Tree Exploration in Connect 4 (EECS-2022-219)
Justin Yokota

Parallelizing Irregular Applications for Distributed Memory Scalability: Case Studies from Genomics (EECS-2020-133)
Marquita Ellis

ImageNet Training in Minutes (EECS-2020-18)
Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel and Kurt Keutzer

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (EECS-2019-103)
Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel and Cho-Jui Hsieh

Large-Batch Training for LSTM and Beyond (EECS-2018-138)
Yang You, James Demmel, Kurt Keutzer, Cho-Jui Hsieh, Chris Ying and Jonathan Hseu

An arithmetic complexity lower bound for computing rational functions, with applications to structured and sparse linear algebra (EECS-2018-82)
James Demmel

Avoiding communication in primal and dual block coordinate descent methods (EECS-2016-197)
Aditya Devarakonda, Kimon Fountoulakis, James Demmel and Michael W. Mahoney

Parallelepipeds obtaining HBL lower bounds (EECS-2016-162)
James Demmel and Alex Rusciano

Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies (EECS-2016-151)
Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottaalam, Jialin Liu, Kristyn Maschhoff, Shane Canon, Jatin Chhugani, Pramod Sharma, Jiyan Yang, James Demmel, Jim Harrell, Venkat Krishnamurthy, Michael W. Mahoney and Mr Prabhat

Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting (EECS-2016-122)
James Demmel, Laura Grigori and Sebastien Cayrols

Efficient Reproducible Floating Point Summation and BLAS (EECS-2016-121)
James Demmel, Peter Ahrens and Hong Diep Nguyen

Efficient Reproducible Floating Point Summation and BLAS (EECS-2015-229)
Peter Ahrens, Hong Diep Nguyen and James Demmel

Write-Avoiding Algorithms (EECS-2015-163)
Erin Carson, James Demmel, Laura Grigori, Nick Knight, Penporn Koanantakool, Oded Schwartz and Harsha Vardhan Simhadri

Matrix Multiplication Algorithm Selection with Support Vector Machines (EECS-2015-29)
Omer Spillinger, David Eliahu, Armando Fox and James Demmel

FRPA: A Framework for Recursive Parallel Algorithms (EECS-2015-28)
David Eliahu, Omer Spillinger, Armando Fox and James Demmel

CA-SVM: Communication-Avoiding Parallel Support Vector Machines on Distributed Systems (EECS-2015-9)
Yang You, James Demmel, Kenneth Czechowski, Le Song and Richard Vuduc

Accuracy of the s-step Lanczos method for the symmetric eigenproblem (EECS-2014-165)
Erin Carson and James Demmel

Contention Bounds for Combinations of Computation Graphs and Network Topologies (EECS-2014-147)
Grey Ballard, James Demmel, Andrew Gearhart, Benjamin Lipshitz, Oded Schwartz and Sivan Toledo

A massively parallel tensor contraction framework for coupled-cluster computations (EECS-2014-143)
Edgar Solomonik, Devin Matthews, Jeff Hammond, John Stanton and James Demmel

Error analysis of the s-step Lanczos method in finite precision (EECS-2014-55)
Erin Carson and James Demmel

Analysis of the finite precision s-step biconjugate gradient method (EECS-2014-18)
Erin Carson and James Demmel

Tradeoffs between synchronization, communication, and work in parallel linear algebra computations (EECS-2014-8)
Edgar Solomonik, Erin Carson, Nicholas Knight and James Demmel

Reconstructing Householder Vectors from Tall-Skinny QR (EECS-2013-175)
Grey Ballard, James Demmel, Laura Grigori, Mathias Jacquelin, Hong Diep Nguyen and Edgar Solomonik

Avoiding Communication in Successive Band Reduction (EECS-2013-131)
Grey Ballard, James Demmel and Nicholas Knight

Communication-Avoiding Symmetric-Indefinite Factorization (EECS-2013-127)
Grey Ballard, Dulceneia Becker, James Demmel, Jack Dongarra, Alex Druinsky, Inon Peled, Oded Schwartz, Sivan Toledo and Ichitaro Yamazaki

An arithmetic complexity lower bound for computing rational functions, with applications to linear algebra (EECS-2013-126)
James Demmel

Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays - Part 1 (EECS-2013-61)
Michael Christ, James Demmel, Nicholas Knight, Thomas Scanlon and Katherine A. Yelick

Exploiting Data Sparsity in Parallel Matrix Powers Computations (EECS-2013-47)
Nicholas Knight, Erin Carson and James Demmel

Communication Avoiding Rank Revealing QR Factorization with Column Pivoting (EECS-2013-46)
James Demmel, Laura Grigori, Ming Gu and Hua Xiang

Communication Optimal Parallel Multiplication of Sparse Random Matrices (EECS-2013-13)
Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Benjamin Lipshitz, Oded Schwartz and Sivan Toledo

Communication Efficient Gaussian Elimination with Partial Pivoting using a Shape Morphing Data Layout (EECS-2013-12)
Grey Ballard, James Demmel, Benjamin Lipshitz, Oded Schwartz and Sivan Toledo

Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions (EECS-2013-11)
Edgar Solomonik, Devin Matthews, Jeff Hammond and James Demmel

Minimizing communication in all-pairs shortest paths (EECS-2013-10)
Edgar Solomonik, Aydin Buluc and James Demmel

Communication-Avoiding Optimization of Geometric Multigrid on GPUs (EECS-2012-258)
Amik Singh

Autotuning Sparse Matrix-Vector Multiplication for Multicore (EECS-2012-215)
Jong-Ho Byun, Richard Lin, Katherine A. Yelick and James Demmel

Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions (EECS-2012-210)
Edgar Solomonik, Devin Matthews, Jeff Hammond and James Demmel

Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication (EECS-2012-205)
James Demmel, David Eliahu, Armando Fox, Shoaib Ashraf Kamil, Benjamin Lipshitz, Oded Schwartz and Omer Spillinger

A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-step Krylov Subspace Methods (EECS-2012-197)
Erin Carson and James Demmel

Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication (EECS-2012-194)
Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz

Instrumenting Linear Algebra Energy Consumption via On-chip Energy Counters (EECS-2012-168)
James Demmel and Andrew Gearhart

Perfect strong scaling using no additional energy (EECS-2012-126)
James Demmel, Andrew Gearhart, Oded Schwartz and Benjamin Lipshitz

Communication-Avoiding Parallel Strassen: Implementation and Performance (EECS-2012-90)
Benjamin Lipshitz, Grey Ballard, Oded Schwartz and James Demmel

Sequential Communication Bounds for Fast Linear Algebra (EECS-2012-36)
Grey Ballard, James Demmel, Olga Holtz and Oded Schwartz

Communication-Optimal Parallel Algorithm for Strassen’s Matrix Multiplication (EECS-2012-32)
Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz

Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds (EECS-2012-31)
Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz

A preliminary analysis of Cyclops Tensor Framework (EECS-2012-29)
Edgar Solomonik, Jeff Hammond and James Demmel

Matrix multiplication on multidimensional torus networks (EECS-2012-28)
Edgar Solomonik and James Demmel

Minimizing communication in all-pairs shortest-paths (EECS-2012-19)
Edgar Solomonik, Aydin Buluc and James Demmel

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version (EECS-2012-15)
Amal Khabou, James Demmel, Laura Grigori and Ming Gu

Avoiding Communication in Two-Sided Krylov Subspace Methods (EECS-2011-93)
Erin Carson, Nicholas Knight and James Demmel

Improving communication performance in dense linear algebra via topology aware collectives (EECS-2011-92)
Edgar Solomonik, Abhinav Bhatele and James Demmel

Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms (EECS-2011-72)
Edgar Solomonik and James Demmel

Graph Expansion and Communication Costs of Fast Matrix Multiplication (EECS-2011-40)
Grey Ballard, James Demmel, Olga Holtz and Oded Schwartz

Minimizing Communication in Numerical Linear Algebra (EECS-2011-15)
Grey Ballard, James Demmel, Olga Holtz and Oded Schwartz

Minimizing Communication for Eigenproblems and the Singular Value Decomposition (EECS-2011-14)
Grey Ballard, James Demmel and Ioana Dumitriu

Communication Bounds for Heterogeneous Architectures (EECS-2011-13)
Grey Ballard, James Demmel and Andrew Gearhart

Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms (EECS-2011-10)
Edgar Solomonik and James Demmel

TORCH Computational Reference Kernels: A Testbed for Computer Science Research (EECS-2010-144)
Alex Kaiser, Samuel Williams, Kamesh Madduri, Khaled Ibrahim, David Bailey, James Demmel and Erich Strohmaier

Communication-Avoiding QR Decomposition for GPUs (EECS-2010-131)
Michael Anderson, Grey Ballard, James Demmel and Kurt Keutzer

CALU: A Communication Optimal LU Factorization Algorithm (EECS-2010-29)
James Demmel, Laura Grigori and Hua Xiang

SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization (EECS-2010-23)
Bryan Catanzaro, Shoaib Ashraf Kamil, Yunsup Lee, Krste Asanović, James Demmel, Kurt Keutzer, John Shalf, Katherine A. Yelick and Armando Fox

Communication-optimal Parallel and Sequential Cholesky decomposition (EECS-2009-29)
Grey Ballard, James Demmel, Olga Holtz and Oded Schwartz

Communication-optimal parallel and sequential QR and LU factorizations (EECS-2008-89)
James Demmel, Laura Grigori, Mark Frederick Hoemmen and Julien Langou

Non-Negative Diagonals and High Performance on Low-Profile Matrices from Householder QR (EECS-2008-76)
James Demmel, Mark Frederick Hoemmen, Yozo Hida and Jason Riedy

LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs (EECS-2008-49)
Vasily Volkov and James Demmel

The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View (EECS-2008-23)
Krste Asanović, Ras Bodik, James Demmel, Tony Keaveny, Kurt Keutzer, John D. Kubiatowicz, Edward A. Lee, Nelson Morgan, George Necula, David A. Patterson, Koushik Sen, John Wawrzynek, David Wessel and Katherine A. Yelick

Using GPUs to Accelerate the Bisection Algorithm for Finding Eigenvalues of Symmetric Tridiagonal Matrices (EECS-2007-179)
Vasily Volkov and James Demmel

Avoiding Communication in Computing Krylov Subspaces (EECS-2007-123)
James Demmel, Mark Frederick Hoemmen, Marghoob Mohiyuddin and Katherine A. Yelick

Extra-precise Iterative Refinement for Overdetermined Least Squares Problems (EECS-2007-77)
James Demmel, Yozo Hida, Xiaoye Li and Edward Jason Riedy

Health Monitoring of Civil Infrastructures Using Wireless Sensor Networks (EECS-2006-121)
Sukun Kim, Shamim Pakzad, David E. Culler, James Demmel, Gregory Fenves, Steve Glaser and Martin Turon

Continuation of Invariant Subspaces for Large Bifurcation Problems (EECS-2006-13)
David Samuel Bindel, James Demmel and Mark Friedman

Error Bounds from Extra Precise Iterative Refinement (CSD-04-1344)
James W. Demmel, Yozo Hida, William Kahan, Xiaoye S. Li, Soni Mukherjee and E. Jason Riedy

Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply (CSD-04-1335)
Rajesh Nishtala, Richard W. Vuduc, James W. Demmel and Katherine A. Yelick

Performance Optimizations and Bounds for Sparse Symmetric Matrix-Multiple Vector Multiply (CSD-03-1297)
Benjamin C. Lee, Richard W. Vuduc, James W. Demmel, Katherine A. Yelick, Michael de Lorimier and Lijue Zhong

Memory Hierarchy Optimizations and Performance Bounds for Sparse A^T Ax (CSD-03-1232)
Richard Vuduc, Attila Gyulassy, James Demmel and Katherine A. Yelick

Accurate Floating Point Summation (CSD-02-1180)
James Demmel and Yozo Hida

Modeling and Identifying Bottlenecks in the EOSDIS Architecture (CSD-97-957)
Sharon L. Smith, Melody Y. Ivory and James Demmel

SuperLU Users' Guide (CSD-97-944)
James W. Demmel, John Gilbert and Xiaoye S. Li

An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination (CSD-97-943)
James W. Demmel, John R. Gilbert and Xiaoye S. Li

Computing the Singular Value Decomposition with High Relative Accuracy (CSD-97-934)
James Demmel, Ming Gu, Stanley Eisenstat, Ivan Slapnicar, Kresimir Veselic and Zlatco Drmac

A Supernodal Approach to Sparse Partial Pivoting (CSD-95-883)
James W. Demmel, Stanley C. Eisenstat, John R. Gilbert, Xiaoye S. Li and Joseph W.H. Liu

On the Correctness of Parallel Bisection in Floating Point (CSD-94-805)
James W. Demmel, Inderjit Dhillon and Huan Ren

Inverse Free Parallel Spectral Divide and Conquer Algorithms for Nonsymmetric Eigenproblems (CSD-94-793)
Zhaojun Bai, James W. Demmel and Ming Gu

Faster Numerical Algorithms via Exception Handling (CSD-93-728)
James W. Demmel and Xiaoye Li

Computing the Generalized Singular Value Decomposition (CSD-92-720)
Zhaojun Bai and James W. Demmel

Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I (CSD-92-718)
Zhaojun Bai and James W. Demmel

The Dimension of Matrices (Matrix Pencils) with Given Jordan (Kronecker) Canonical Forms (CSD-92-706)
James W. Demmel and Alan Edelman

Parallel Numerical Linear Algebra (CSD-92-703)
James W. Demmel, Michael T. Heath and Henk A. van der Vorst

Trading Off Parallelism and Numerical Stability (CSD-92-702)
James W. Demmel

Algorithms for Intersecting Parametric and Algebraic Curves (CSD-92-698)
Dinesh Manocha and James W. Demmel

Computing the Generalized Singular Value Decomposition (CSD-91-645)
Zhaojun Bai and James W. Demmel

Improved Error Bounds for Underdetermined System Solvers (CSD-90-587)
James W. Demmel and Nicholas J. Higham

Stability of Block Algorithms with Fast Level 3 BLAS (CSD-90-584)
James W. Demmel and Nicholas J. Higham

Effects of Underflow on Solving Linear Systems (CSD-83-128)
James W. Demmel

The Condition Number of Similarities that Diagonalize Matrices (CSD-83-127)
James W. Demmel

An Interval Algorithm for Solving Systems of Linear Equations to Prespecified Accuracy (CSD-83-126)
James W. Demmel and Fritz Kruckeberg