Modeling Parallel Sorts with LogP on the CM-5

Andrea Carol Dusseau

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-94-829
September 1994

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/CSD-94-829.pdf

In this paper, the LogP model is used to analyze four parallel sorting algorithms (bitonic, column, radix, and sample sort). LogP characterizes the performance of modern parallel machines with a small set of parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). We develop implementations of these algorithms in Split-C, a parallel extension to C, and compare the performance predicted by LogP to actual performance on a CM-5 of 32 to 512 processors for a range of problem sizes and input sets. The sensitivity of the algorithms is evaluated by varying the distribution of key values and the rank ordering of the input.

The LogP model is shown to be a valuable guide in the development of parallel algorithms and a good predictor of implementation performance. The model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention. Using an empirical model of local processor performance, LogP predictions closely match observed execution times on uniformly distributed keys across a broad range of problem and machine sizes for all four algorithms. Communication performance is oblivious to the distribution of the keys values, whereas the local sort performance is not. The communication phases in radix and sample sort are sensitive to the ordering of keys, because certain layouts result in contention.


BibTeX citation:

@techreport{Dusseau:CSD-94-829,
    Author = {Dusseau, Andrea Carol},
    Title = {Modeling Parallel Sorts with LogP on the CM-5},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1994},
    Month = {Sep},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/5847.html},
    Number = {UCB/CSD-94-829},
    Abstract = {In this paper, the LogP model is used to analyze four parallel sorting algorithms (bitonic, column, radix, and sample sort). LogP characterizes the performance of modern parallel machines with a small set of parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). We develop implementations of these algorithms in Split-C, a parallel extension to C, and compare the performance predicted by LogP to actual performance on a CM-5 of 32 to 512 processors for a range of problem sizes and input sets. The sensitivity of the algorithms is evaluated by varying the distribution of key values and the rank ordering of the input. <p>The LogP model is shown to be a valuable guide in the development of parallel algorithms and a good predictor of implementation performance. The model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention. Using an empirical model of local processor performance, LogP predictions closely match observed execution times on uniformly distributed keys across a broad range of problem and machine sizes for all four algorithms. Communication performance is oblivious to the distribution of the keys values, whereas the local sort performance is not. The communication phases in radix and sample sort are sensitive to the ordering of keys, because certain layouts result in contention.}
}

EndNote citation:

%0 Report
%A Dusseau, Andrea Carol
%T Modeling Parallel Sorts with LogP on the CM-5
%I EECS Department, University of California, Berkeley
%D 1994
%@ UCB/CSD-94-829
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/5847.html
%F Dusseau:CSD-94-829