Effect of Communication Latency, Overhead, and Bandwidth on a Cluster Architecture

Richard Martin, Amin Vahdat, David Culler and Thomas Anderson

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-96-925
November 1996

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1996/CSD-96-925.pdf

This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased by from 3 to 103 mus. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance.


BibTeX citation:

@techreport{Martin:CSD-96-925,
    Author = {Martin, Richard and Vahdat, Amin and Culler, David and Anderson, Thomas},
    Title = {Effect of Communication Latency, Overhead, and Bandwidth on a Cluster Architecture},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1996},
    Month = {Nov},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1996/6209.html},
    Number = {UCB/CSD-96-925},
    Abstract = {This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased by from 3 to 103 mus. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance.}
}

EndNote citation:

%0 Report
%A Martin, Richard
%A Vahdat, Amin
%A Culler, David
%A Anderson, Thomas
%T Effect of Communication Latency, Overhead, and Bandwidth on a Cluster Architecture
%I EECS Department, University of California, Berkeley
%D 1996
%@ UCB/CSD-96-925
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1996/6209.html
%F Martin:CSD-96-925