Technical Reports - James Demmel
Randomized Numerical Linear Algebra: A Perspective on the Field With an Eye to Software (EECS-2023-19)
Riley Murray, James Demmel, Michael W. Mahoney, N. Benjamin Erichson, Maksim Melnichenko, Osman Asif Malik, Laura Grigori, Piotr Luszczek, Michal Derezinski, Miles E. Lopes, Tianyu Liang, Hengrui Luo and Jack Dongarra
High Efficiency Computation of Game Tree Exploration in Connect 4 (EECS-2022-219)
Justin Yokota
Parallelizing Irregular Applications for Distributed Memory Scalability: Case Studies from Genomics (EECS-2020-133)
Marquita Ellis
ImageNet Training in Minutes (EECS-2020-18)
Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel and Kurt Keutzer
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (EECS-2019-103)
Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel and Cho-Jui Hsieh
Large-Batch Training for LSTM and Beyond (EECS-2018-138)
Yang You, James Demmel, Kurt Keutzer, Cho-Jui Hsieh, Chris Ying and Jonathan Hseu
An arithmetic complexity lower bound for computing rational functions, with applications to structured and sparse linear algebra (EECS-2018-82)
James Demmel
Avoiding communication in primal and dual block coordinate descent methods (EECS-2016-197)
Aditya Devarakonda, Kimon Fountoulakis, James Demmel and Michael W. Mahoney
Parallelepipeds obtaining HBL lower bounds (EECS-2016-162)
James Demmel and Alex Rusciano
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies (EECS-2016-151)
Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottaalam, Jialin Liu, Kristyn Maschhoff, Shane Canon, Jatin Chhugani, Pramod Sharma, Jiyan Yang, James Demmel, Jim Harrell, Venkat Krishnamurthy, Michael W. Mahoney and Mr Prabhat
Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting (EECS-2016-122)
James Demmel, Laura Grigori and Sebastien Cayrols
Efficient Reproducible Floating Point Summation and BLAS (EECS-2016-121)
James Demmel, Willow Ahrens and Hong Diep Nguyen
Efficient Reproducible Floating Point Summation and BLAS (EECS-2015-229)
Willow Ahrens, Hong Diep Nguyen and James Demmel
Write-Avoiding Algorithms (EECS-2015-163)
Erin Carson, James Demmel, Laura Grigori, Nick Knight, Penporn Koanantakool, Oded Schwartz and Harsha Vardhan Simhadri
Matrix Multiplication Algorithm Selection with Support Vector Machines (EECS-2015-29)
Omer Spillinger, David Eliahu, Armando Fox and James Demmel
FRPA: A Framework for Recursive Parallel Algorithms (EECS-2015-28)
David Eliahu, Omer Spillinger, Armando Fox and James Demmel
CA-SVM: Communication-Avoiding Parallel Support Vector Machines on Distributed Systems (EECS-2015-9)
Yang You, James Demmel, Kenneth Czechowski, Le Song and Richard Vuduc
Accuracy of the s-step Lanczos method for the symmetric eigenproblem (EECS-2014-165)
Erin Carson and James Demmel
Contention Bounds for Combinations of Computation Graphs and Network Topologies (EECS-2014-147)
Grey Ballard, James Demmel, Andrew Gearhart, Benjamin Lipshitz, Oded Schwartz and Sivan Toledo
A massively parallel tensor contraction framework for coupled-cluster computations (EECS-2014-143)
Edgar Solomonik, Devin Matthews, Jeff Hammond, John Stanton and James Demmel
Error analysis of the s-step Lanczos method in finite precision (EECS-2014-55)
Erin Carson and James Demmel
Analysis of the finite precision s-step biconjugate gradient method (EECS-2014-18)
Erin Carson and James Demmel
Tradeoffs between synchronization, communication, and work in parallel linear algebra computations (EECS-2014-8)
Edgar Solomonik, Erin Carson, Nicholas Knight and James Demmel
Reconstructing Householder Vectors from Tall-Skinny QR (EECS-2013-175)
Grey Ballard, James Demmel, Laura Grigori, Mathias Jacquelin, Hong Diep Nguyen and Edgar Solomonik
Avoiding Communication in Successive Band Reduction (EECS-2013-131)
Grey Ballard, James Demmel and Nicholas Knight
Communication-Avoiding Symmetric-Indefinite Factorization (EECS-2013-127)
Grey Ballard, Dulceneia Becker, James Demmel, Jack Dongarra, Alex Druinsky, Inon Peled, Oded Schwartz, Sivan Toledo and Ichitaro Yamazaki
An arithmetic complexity lower bound for computing rational functions, with applications to linear algebra (EECS-2013-126)
James Demmel
Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays - Part 1 (EECS-2013-61)
Michael Christ, James Demmel, Nicholas Knight, Thomas Scanlon and Katherine A. Yelick
Exploiting Data Sparsity in Parallel Matrix Powers Computations (EECS-2013-47)
Nicholas Knight, Erin Carson and James Demmel
Communication Avoiding Rank Revealing QR Factorization with Column Pivoting (EECS-2013-46)
James Demmel, Laura Grigori, Ming Gu and Hua Xiang
Communication Optimal Parallel Multiplication of Sparse Random Matrices (EECS-2013-13)
Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Benjamin Lipshitz, Oded Schwartz and Sivan Toledo
Communication Efficient Gaussian Elimination with Partial Pivoting using a Shape Morphing Data Layout (EECS-2013-12)
Grey Ballard, James Demmel, Benjamin Lipshitz, Oded Schwartz and Sivan Toledo
Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions (EECS-2013-11)
Edgar Solomonik, Devin Matthews, Jeff Hammond and James Demmel
Minimizing communication in all-pairs shortest paths (EECS-2013-10)
Edgar Solomonik, Aydin Buluc and James Demmel
Communication-Avoiding Optimization of Geometric Multigrid on GPUs (EECS-2012-258)
Amik Singh
Autotuning Sparse Matrix-Vector Multiplication for Multicore (EECS-2012-215)
Jong-Ho Byun, Richard Lin, Katherine A. Yelick and James Demmel
Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions (EECS-2012-210)
Edgar Solomonik, Devin Matthews, Jeff Hammond and James Demmel
Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication (EECS-2012-205)
James Demmel, David Eliahu, Armando Fox, Shoaib Ashraf Kamil, Benjamin Lipshitz, Oded Schwartz and Omer Spillinger
A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-step Krylov Subspace Methods (EECS-2012-197)
Erin Carson and James Demmel
Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication (EECS-2012-194)
Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz
Instrumenting Linear Algebra Energy Consumption via On-chip Energy Counters (EECS-2012-168)
James Demmel and Andrew Gearhart
Perfect strong scaling using no additional energy (EECS-2012-126)
James Demmel, Andrew Gearhart, Oded Schwartz and Benjamin Lipshitz
Communication-Avoiding Parallel Strassen: Implementation and Performance (EECS-2012-90)
Benjamin Lipshitz, Grey Ballard, Oded Schwartz and James Demmel
Sequential Communication Bounds for Fast Linear Algebra (EECS-2012-36)
Grey Ballard, James Demmel, Olga Holtz and Oded Schwartz
Communication-Optimal Parallel Algorithm for Strassen’s Matrix Multiplication (EECS-2012-32)
Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz
Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds (EECS-2012-31)
Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz
A preliminary analysis of Cyclops Tensor Framework (EECS-2012-29)
Edgar Solomonik, Jeff Hammond and James Demmel
Matrix multiplication on multidimensional torus networks (EECS-2012-28)
Edgar Solomonik and James Demmel
Minimizing communication in all-pairs shortest-paths (EECS-2012-19)
Edgar Solomonik, Aydin Buluc and James Demmel
LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version (EECS-2012-15)
Amal Khabou, James Demmel, Laura Grigori and Ming Gu
Avoiding Communication in Two-Sided Krylov Subspace Methods (EECS-2011-93)
Erin Carson, Nicholas Knight and James Demmel
Improving communication performance in dense linear algebra via topology aware collectives (EECS-2011-92)
Edgar Solomonik, Abhinav Bhatele and James Demmel
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms (EECS-2011-72)
Edgar Solomonik and James Demmel
Graph Expansion and Communication Costs of Fast Matrix Multiplication (EECS-2011-40)
Grey Ballard, James Demmel, Olga Holtz and Oded Schwartz
Minimizing Communication in Numerical Linear Algebra (EECS-2011-15)
Grey Ballard, James Demmel, Olga Holtz and Oded Schwartz
Minimizing Communication for Eigenproblems and the Singular Value Decomposition (EECS-2011-14)
Grey Ballard, James Demmel and Ioana Dumitriu
Communication Bounds for Heterogeneous Architectures (EECS-2011-13)
Grey Ballard, James Demmel and Andrew Gearhart
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms (EECS-2011-10)
Edgar Solomonik and James Demmel
TORCH Computational Reference Kernels: A Testbed for Computer Science Research (EECS-2010-144)
Alex Kaiser, Samuel Williams, Kamesh Madduri, Khaled Ibrahim, David Bailey, James Demmel and Erich Strohmaier
Communication-Avoiding QR Decomposition for GPUs (EECS-2010-131)
Michael Anderson, Grey Ballard, James Demmel and Kurt Keutzer
CALU: A Communication Optimal LU Factorization Algorithm (EECS-2010-29)
James Demmel, Laura Grigori and Hua Xiang
SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization (EECS-2010-23)
Bryan Catanzaro, Shoaib Ashraf Kamil, Yunsup Lee, Krste Asanović, James Demmel, Kurt Keutzer, John Shalf, Katherine A. Yelick and Armando Fox
Communication-optimal Parallel and Sequential Cholesky decomposition (EECS-2009-29)
Grey Ballard, James Demmel, Olga Holtz and Oded Schwartz
Communication-optimal parallel and sequential QR and LU factorizations (EECS-2008-89)
James Demmel, Laura Grigori, Mark Frederick Hoemmen and Julien Langou
Non-Negative Diagonals and High Performance on Low-Profile Matrices from Householder QR (EECS-2008-76)
James Demmel, Mark Frederick Hoemmen, Yozo Hida and Jason Riedy
LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs (EECS-2008-49)
Vasily Volkov and James Demmel
The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View (EECS-2008-23)
Krste Asanović, Ras Bodik, James Demmel, Tony Keaveny, Kurt Keutzer, John D. Kubiatowicz, Edward A. Lee, Nelson Morgan, George Necula, David A. Patterson, Koushik Sen, John Wawrzynek, David Wessel and Katherine A. Yelick
Using GPUs to Accelerate the Bisection Algorithm for Finding Eigenvalues of Symmetric Tridiagonal Matrices (EECS-2007-179)
Vasily Volkov and James Demmel
Avoiding Communication in Computing Krylov Subspaces (EECS-2007-123)
James Demmel, Mark Frederick Hoemmen, Marghoob Mohiyuddin and Katherine A. Yelick
Extra-precise Iterative Refinement for Overdetermined Least Squares Problems (EECS-2007-77)
James Demmel, Yozo Hida, Xiaoye Li and Edward Jason Riedy
Health Monitoring of Civil Infrastructures Using Wireless Sensor Networks (EECS-2006-121)
Sukun Kim, Shamim Pakzad, David E. Culler, James Demmel, Gregory Fenves, Steve Glaser and Martin Turon
Continuation of Invariant Subspaces for Large Bifurcation Problems (EECS-2006-13)
David Samuel Bindel, James Demmel and Mark Friedman
Error Bounds from Extra Precise Iterative Refinement (CSD-04-1344)
James W. Demmel, Yozo Hida, William Kahan, Xiaoye S. Li, Soni Mukherjee and E. Jason Riedy
Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply (CSD-04-1335)
Rajesh Nishtala, Richard W. Vuduc, James W. Demmel and Katherine A. Yelick
Performance Optimizations and Bounds for Sparse Symmetric Matrix-Multiple Vector Multiply (CSD-03-1297)
Benjamin C. Lee, Richard W. Vuduc, James W. Demmel, Katherine A. Yelick, Michael de Lorimier and Lijue Zhong
Memory Hierarchy Optimizations and Performance Bounds for Sparse A^T Ax (CSD-03-1232)
Richard Vuduc, Attila Gyulassy, James Demmel and Katherine A. Yelick
Accurate Floating Point Summation (CSD-02-1180)
James Demmel and Yozo Hida
Modeling and Identifying Bottlenecks in the EOSDIS Architecture (CSD-97-957)
Sharon L. Smith, Melody Y. Ivory and James Demmel
SuperLU Users' Guide (CSD-97-944)
James W. Demmel, John Gilbert and Xiaoye S. Li
An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination (CSD-97-943)
James W. Demmel, John R. Gilbert and Xiaoye S. Li
Computing the Singular Value Decomposition with High Relative Accuracy (CSD-97-934)
James Demmel, Ming Gu, Stanley Eisenstat, Ivan Slapnicar, Kresimir Veselic and Zlatco Drmac
A Supernodal Approach to Sparse Partial Pivoting (CSD-95-883)
James W. Demmel, Stanley C. Eisenstat, John R. Gilbert, Xiaoye S. Li and Joseph W.H. Liu
On the Correctness of Parallel Bisection in Floating Point (CSD-94-805)
James W. Demmel, Inderjit Dhillon and Huan Ren
Inverse Free Parallel Spectral Divide and Conquer Algorithms for Nonsymmetric Eigenproblems (CSD-94-793)
Zhaojun Bai, James W. Demmel and Ming Gu
Faster Numerical Algorithms via Exception Handling (CSD-93-728)
James W. Demmel and Xiaoye Li
Computing the Generalized Singular Value Decomposition (CSD-92-720)
Zhaojun Bai and James W. Demmel
Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I (CSD-92-718)
Zhaojun Bai and James W. Demmel
The Dimension of Matrices (Matrix Pencils) with Given Jordan (Kronecker) Canonical Forms (CSD-92-706)
James W. Demmel and Alan Edelman
Parallel Numerical Linear Algebra (CSD-92-703)
James W. Demmel, Michael T. Heath and Henk A. van der Vorst
Trading Off Parallelism and Numerical Stability (CSD-92-702)
James W. Demmel
Algorithms for Intersecting Parametric and Algebraic Curves (CSD-92-698)
Dinesh Manocha and James W. Demmel
Computing the Generalized Singular Value Decomposition (CSD-91-645)
Zhaojun Bai and James W. Demmel
Improved Error Bounds for Underdetermined System Solvers (CSD-90-587)
James W. Demmel and Nicholas J. Higham
Stability of Block Algorithms with Fast Level 3 BLAS (CSD-90-584)
James W. Demmel and Nicholas J. Higham
Effects of Underflow on Solving Linear Systems (CSD-83-128)
James W. Demmel
The Condition Number of Similarities that Diagonalize Matrices (CSD-83-127)
James W. Demmel
An Interval Algorithm for Solving Systems of Linear Equations to Prespecified Accuracy (CSD-83-126)
James W. Demmel and Fritz Kruckeberg