Accelerating Multilinear Maps and Structured Sparse Tensor Kernels
Vivek Bharadwaj
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2025-146
August 4, 2025
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-146.pdf
Linear maps dominate machine learning and scientific computing workloads today. What about multilinear maps? Just as a linear map with one argument can be represented by a 2D matrix, a D-dimensional multilinear map is represented by a (D + 1)-dimensional tensor. We apply the map by flattening the tensor into a matrix and multiplying it by the Kronecker product of the inputs. When a batch of inputs is provided, this primitive is known as the Matricized-Tensor-Times-Khatri-Rao Product (MTTKRP). Efficient multilinear maps are essential in computational chemistry, multi-way data analysis, and signal processing. Unfortunately, they receive comparatively less interest from theorists and high-performance kernel designers.
We optimize the multilinear map in two applications, making contributions that span theory and practical implementation. We first examine equivariant graph neural networks, which use a structured sparse tensor to interact node features with edge features. In response, we design a GPU kernel generator that matches or exceeds the best closed-source implementations for the problem. Our package, OpenEquivariance, provides 5–6× end-to-end speedup for training quantum chemical foundation models. Our focus then shifts to Candecomp/PARAFAC decomposition, a higher-dimensional analogue of the matrix singular value decomposition. Here, we use randomized linear algebra to accelerate the MTTKRP in tall, overdetermined linear least-squares problems, scaling our work to thousands of CPU cores. The remaining chapters detour by adapting this randomized algorithm to sketch tensor trains, structures that originated in quantum mechanical computations. We also design communication-avoiding algorithms for a pair of kernels used in matrix completion and graph attention networks. Our work demonstrates that sustained attention to the multilinear map yields fruit across the computational stack.
Advisors: James Demmel and Aydin Buluç
BibTeX citation:
@phdthesis{Bharadwaj:EECS-2025-146,
Author= {Bharadwaj, Vivek},
Title= {Accelerating Multilinear Maps and Structured Sparse Tensor Kernels},
School= {EECS Department, University of California, Berkeley},
Year= {2025},
Month= {Aug},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-146.html},
Number= {UCB/EECS-2025-146},
Abstract= {Linear maps dominate machine learning and scientific computing workloads today. What about multilinear maps? Just as a linear map with one argument can be represented by a 2D matrix, a D-dimensional multilinear map is represented by a (D + 1)-dimensional tensor. We apply the map by flattening the tensor into a matrix and multiplying it by the Kronecker product of the inputs. When a batch of inputs is provided, this primitive is known as the Matricized-Tensor-Times-Khatri-Rao Product (MTTKRP). Efficient multilinear maps are essential in computational chemistry, multi-way data analysis, and signal processing. Unfortunately, they receive comparatively less interest from theorists and high-performance kernel designers.
We optimize the multilinear map in two applications, making contributions that span theory and practical implementation. We first examine equivariant graph neural networks, which use a structured sparse tensor to interact node features with edge features. In response, we design a GPU kernel generator that matches or exceeds the best closed-source implementations for the problem. Our package, OpenEquivariance, provides 5–6× end-to-end speedup for training quantum chemical foundation models. Our focus then shifts to Candecomp/PARAFAC decomposition, a higher-dimensional analogue of the matrix singular value decomposition. Here, we use randomized linear algebra to accelerate the MTTKRP in tall, overdetermined linear least-squares problems, scaling our work to thousands of CPU cores. The remaining chapters detour by adapting this randomized algorithm to sketch tensor trains, structures that originated in quantum mechanical computations. We also design communication-avoiding algorithms for a pair of kernels used in matrix completion and graph attention networks. Our work demonstrates that sustained attention to the multilinear map yields fruit across the computational stack.},
}
EndNote citation:
%0 Thesis %A Bharadwaj, Vivek %T Accelerating Multilinear Maps and Structured Sparse Tensor Kernels %I EECS Department, University of California, Berkeley %D 2025 %8 August 4 %@ UCB/EECS-2025-146 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-146.html %F Bharadwaj:EECS-2025-146