Ujjaini Mukhopadhyay

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-253

December 1, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-253.pdf

Graph neural network (GNN) training has low computational intensity and thus communication costs can limit scalability. Sparse-matrix dense-matrix multiplication (SpMM), where the dense matrix is tall and skinny, is the bottleneck in full-graph training of GNNs. Previous work on distributing SpMM focused on sparsity-oblivious algorithms, where blocks of the matrices are communicated regardless of the sparsity pattern. This provides predictable communication patterns that can be overlapped with computation and maximizes the use of optimized collective communication functions, but it wastes significant bandwidth by communicating unnecessary data.

We present sparsity-aware algorithms that communicate only the parts of matrix blocks that are needed based on the sparsity pattern of the input graph. We couple our sparsity-aware SpMM algorithm with a communication-avoiding (1.5D) approach and a specialized graph partitioning algorithm that minimizes the maximum data volume communicated per process in addition to the total data volume. This addresses communication load imbalance and therefore total cost better than average volume alone. We explore the trade-offs from different graph problems and machine sizes, with the combined optimizations showing up to 14x improvement on 256 GPUs relative to a popular GNN framework based on communication-oblivious SpMM.

Advisors: Katherine A. Yelick


BibTeX citation:

@mastersthesis{Mukhopadhyay:EECS-2023-253,
    Author= {Mukhopadhyay, Ujjaini},
    Editor= {Yelick, Katherine A. and Buluç, Aydin},
    Title= {Sparsity-aware communication for distributed graph neural network training},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-253.html},
    Number= {UCB/EECS-2023-253},
    Abstract= {Graph neural network (GNN) training has low computational intensity and thus communication costs can limit scalability. Sparse-matrix dense-matrix multiplication (SpMM), where the dense matrix is tall and skinny, is the bottleneck in full-graph training of GNNs. Previous work on distributing SpMM focused on sparsity-oblivious algorithms, where blocks of the matrices are communicated regardless of the sparsity pattern. This provides predictable communication patterns that can be overlapped with computation and maximizes the use of optimized collective communication functions, but it wastes significant bandwidth by communicating unnecessary data. 

We present sparsity-aware algorithms that communicate only the parts of matrix blocks that are needed based on the sparsity pattern of the input graph. We couple our sparsity-aware SpMM algorithm with a communication-avoiding (1.5D) approach and a specialized graph partitioning algorithm that minimizes the maximum data volume communicated per process in addition to the total data volume.  This addresses communication load imbalance and therefore total cost better than average volume alone. We explore the trade-offs from different graph problems and machine sizes, with the combined optimizations showing up to 14x improvement on 256 GPUs relative to a popular GNN framework based on communication-oblivious SpMM.},
}

EndNote citation:

%0 Thesis
%A Mukhopadhyay, Ujjaini 
%E Yelick, Katherine A. 
%E Buluç, Aydin 
%T Sparsity-aware communication for distributed graph neural network training
%I EECS Department, University of California, Berkeley
%D 2023
%8 December 1
%@ UCB/EECS-2023-253
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-253.html
%F Mukhopadhyay:EECS-2023-253