Scaling Graph Neural Networks for Sciences
Alok Tripathy
EECS Department, University of California, Berkeley
Technical Report No. UCB/
December 1, 2025
Graph Neural Networks (GNNs) have underpinned state-of-the-art results for a wide array of problems in recommender systems, molecular dynamics simulations, high-energy physics, and many other fields. However, graph datasets in these disciplines are frequently large. For instance, GNNs for high-energy physics problems contain trillions of edges. As a consequence, methods must be developed to train large-scale GNN models on massive supercomputers.
However, scaling GNNs on supercomputers is difficult. GNNs are currently losers of the “hardware lottery.” This means that their underlying kernels do not efficiently use the re- sources of a GPU-based supercomputer, unlike models like Transformers that achieve high cluster utilization. This thesis outlines methods to accelerate distributed GNN training, making GNNs practical for downstream applications in pursuit of “winning” the hardware lottery.
The main ideas underlying this work are to express GNN training with sparse matrix multipli- cation and scale training with communication-avoiding distributed sparse matrix algorithms. First, we outline how distributed sparse-dense matrix multiplication effectively scales full-batch GNN training. We dive further into this topic by introducing efficient sparsity-aware and load balancing techniques. Second, we show how to accelerate minibatch GNN training with sparse-sparse matrix multiplication to perform batch sampling. Lastly, we show how these methods can scale GNNs that solve particle track reconstruction, an important problem in high-energy physics.
Advisors: Katherine A. Yelick and Aydin Buluç
BibTeX citation:
@phdthesis{Tripathy:31789, Author= {Tripathy, Alok}, Title= {Scaling Graph Neural Networks for Sciences}, School= {EECS Department, University of California, Berkeley}, Year= {2025}, Number= {UCB/}, Abstract= {Graph Neural Networks (GNNs) have underpinned state-of-the-art results for a wide array of problems in recommender systems, molecular dynamics simulations, high-energy physics, and many other fields. However, graph datasets in these disciplines are frequently large. For instance, GNNs for high-energy physics problems contain trillions of edges. As a consequence, methods must be developed to train large-scale GNN models on massive supercomputers. However, scaling GNNs on supercomputers is difficult. GNNs are currently losers of the “hardware lottery.” This means that their underlying kernels do not efficiently use the re- sources of a GPU-based supercomputer, unlike models like Transformers that achieve high cluster utilization. This thesis outlines methods to accelerate distributed GNN training, making GNNs practical for downstream applications in pursuit of “winning” the hardware lottery. The main ideas underlying this work are to express GNN training with sparse matrix multipli- cation and scale training with communication-avoiding distributed sparse matrix algorithms. First, we outline how distributed sparse-dense matrix multiplication effectively scales full-batch GNN training. We dive further into this topic by introducing efficient sparsity-aware and load balancing techniques. Second, we show how to accelerate minibatch GNN training with sparse-sparse matrix multiplication to perform batch sampling. Lastly, we show how these methods can scale GNNs that solve particle track reconstruction, an important problem in high-energy physics.}, }
EndNote citation:
%0 Thesis %A Tripathy, Alok %T Scaling Graph Neural Networks for Sciences %I EECS Department, University of California, Berkeley %D 2025 %8 December 1 %@ UCB/ %F Tripathy:31789