Rohit Agarwal and Alec James and Joshua You

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-235

November 20, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-235.pdf

Sparse dynamic graph workloads are common for many modern database problems. Thus, it is necessary to have graph data structures that have high update and query throughput while also operating in a distributed manner, where many machines can access a shared structure. Furthermore, such a data structure should be cache-optimal in the face of locality exploiting algorithms like Breadth-first search. The cache friendly Packed Compact Sparse Row (PCSR) has already seen great use in the shared memory setting. We propose an extension of the PCSR, the DistPCSR, which uses ideas from the serial one-machine Packed Memory Array (PMA) to build a routing layer between many PCSRs on different machines. This allows us to take queries while updates are still being resolved, and to share load between servers to stop any one from using too much memory. We provide some guarantees for this data structure, allowing one to be confident in its answers in a distributed setting. We also describe how to use the UPC++ framework’s primitives to implement a DistPCSR. Finally, we test our implemented DistPCSR on realistic graph workloads with algorithms that are friendly with PCSRs and share our results.


BibTeX citation:

@techreport{Agarwal:EECS-2023-235,
    Author= {Agarwal, Rohit and James, Alec and You, Joshua},
    Title= {Packed Memory Arrays for Dynamic Graphs in the Distributed Memory Setting},
    Year= {2023},
    Month= {Nov},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-235.html},
    Number= {UCB/EECS-2023-235},
    Abstract= {Sparse dynamic graph workloads are common for many modern database problems. Thus, it is necessary to have graph data structures that have high update and query throughput while also operating in a distributed manner, where many machines can access a shared structure. Furthermore, such a data structure should be cache-optimal in the face of locality exploiting algorithms like Breadth-first search. The cache friendly Packed Compact Sparse Row (PCSR) has already seen great use in the shared memory setting. 
We propose an extension of the PCSR, the DistPCSR, which uses ideas from the serial one-machine Packed Memory Array (PMA) to build a routing layer between many PCSRs on different machines. This allows us to take queries while updates are still being resolved, and to share load between servers to stop any one from using too much memory. We provide some guarantees for this data structure, allowing one to be confident in its answers in a distributed setting. We also describe how to use the UPC++ framework’s primitives to implement a DistPCSR. 
Finally, we test our implemented DistPCSR on realistic graph workloads with algorithms that are friendly with
PCSRs and share our results.},
}

EndNote citation:

%0 Report
%A Agarwal, Rohit 
%A James, Alec 
%A You, Joshua 
%T Packed Memory Arrays for Dynamic Graphs in the Distributed Memory Setting
%I EECS Department, University of California, Berkeley
%D 2023
%8 November 20
%@ UCB/EECS-2023-235
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-235.html
%F Agarwal:EECS-2023-235