Evaluation of Architectural Support for Global Address-Based Communication in Large-Scale Parallel Machines
Arvind Krishnamurthy and Klaus E. Schauser and Chris J. Scheiman and David E. Culler and Katherine Yelick and Randolph Y. Wang
EECS Department, University of California, Berkeley
Technical Report No. UCB/CSD-98-984
, 1998
http://www2.eecs.berkeley.edu/Pubs/TechRpts/1998/CSD-98-984.pdf
Large-scale parallel machines are incorporating increasingly sophisticated architectural support for user-level messaging and global memory access. We provide a systematic evaluation of a broad spectrum of current design alternatives based on our implementations of a global address language on the Thinking Machine CM-5, Intel Paragon, Meiko CS-2, Cray T3D, and Berkeley NOW. This evaluation includes a range of compilation strategies that make varying use of the network processor; each is optimized for the target architecture and the particular strategy. We analyze a family of interacting issues that determine the performance tradeoffs in each implementation, quantify the resulting latency, overhead, and bandwidth of the global access operations, and demonstrate the effects on application performance.
BibTeX citation:
@techreport{Krishnamurthy:CSD-98-984, Author= {Krishnamurthy, Arvind and Schauser, Klaus E. and Scheiman, Chris J. and Culler, David E. and Yelick, Katherine and Wang, Randolph Y.}, Title= {Evaluation of Architectural Support for Global Address-Based Communication in Large-Scale Parallel Machines}, Year= {1998}, Month= {Jan}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1998/5427.html}, Number= {UCB/CSD-98-984}, Abstract= {Large-scale parallel machines are incorporating increasingly sophisticated architectural support for user-level messaging and global memory access. We provide a systematic evaluation of a broad spectrum of current design alternatives based on our implementations of a global address language on the Thinking Machine CM-5, Intel Paragon, Meiko CS-2, Cray T3D, and Berkeley NOW. This evaluation includes a range of compilation strategies that make varying use of the network processor; each is optimized for the target architecture and the particular strategy. We analyze a family of interacting issues that determine the performance tradeoffs in each implementation, quantify the resulting latency, overhead, and bandwidth of the global access operations, and demonstrate the effects on application performance.}, }
EndNote citation:
%0 Report %A Krishnamurthy, Arvind %A Schauser, Klaus E. %A Scheiman, Chris J. %A Culler, David E. %A Yelick, Katherine %A Wang, Randolph Y. %T Evaluation of Architectural Support for Global Address-Based Communication in Large-Scale Parallel Machines %I EECS Department, University of California, Berkeley %D 1998 %@ UCB/CSD-98-984 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1998/5427.html %F Krishnamurthy:CSD-98-984