Empirical Evaluation of Global Memory Support on the Cray-T3D and Cray-T3E
Arvind Krishnamurthy and David E. Culler and Katherine Yelick
EECS Department, University of California, Berkeley
Technical Report No. UCB/CSD-98-991
, 1998
http://www2.eecs.berkeley.edu/Pubs/TechRpts/1998/CSD-98-991.pdf
We perform an empirical comparison of the Cray-T3E and the Cray-T3D from the perspective of compiling a global address space language, Split-C. Both machines provide an elaborate shell to support different forms of global memory access. We provide a detailed performance characterization of the machine primitives and evaluate their utility in code generation for a parallel language. We observe that the changes made in the T3E result in fewer options for implementing remote access and overall a much simpler compilation strategy. Unfortunately, the raw hardware performance of some of the remote access mechanisms have worse performance on the T3E due to extra logic on the destination processor and the presence of a second level cache on the source processor. However, the language implementation adds less overhead than a corresponding implementation on the T3D resulting in better end-to-end performance for some of the primitives.
BibTeX citation:
@techreport{Krishnamurthy:CSD-98-991, Author= {Krishnamurthy, Arvind and Culler, David E. and Yelick, Katherine}, Title= {Empirical Evaluation of Global Memory Support on the Cray-T3D and Cray-T3E}, Year= {1998}, Month= {Aug}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1998/5246.html}, Number= {UCB/CSD-98-991}, Abstract= {We perform an empirical comparison of the Cray-T3E and the Cray-T3D from the perspective of compiling a global address space language, Split-C. Both machines provide an elaborate shell to support different forms of global memory access. We provide a detailed performance characterization of the machine primitives and evaluate their utility in code generation for a parallel language. We observe that the changes made in the T3E result in fewer options for implementing remote access and overall a much simpler compilation strategy. Unfortunately, the raw hardware performance of some of the remote access mechanisms have worse performance on the T3E due to extra logic on the destination processor and the presence of a second level cache on the source processor. However, the language implementation adds less overhead than a corresponding implementation on the T3D resulting in better end-to-end performance for some of the primitives.}, }
EndNote citation:
%0 Report %A Krishnamurthy, Arvind %A Culler, David E. %A Yelick, Katherine %T Empirical Evaluation of Global Memory Support on the Cray-T3D and Cray-T3E %I EECS Department, University of California, Berkeley %D 1998 %@ UCB/CSD-98-991 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1998/5246.html %F Krishnamurthy:CSD-98-991