Microbenchmarking and Performance Prediction for Parallel Computers

Stephen J. Von Worley and Alan Jay Smith

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-95-873
May 1995

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1995/CSD-95-873.pdf

Previous research on this project (in work by Saavedra and Smith) has presented performance evaluation of sequential computers. That work presented (a) measurements of machines at the source language primitive operation level; (b) analysis of standard benchmarks; (c) prediction of run times based on separate measurements of the machines and the programs; (d) analysis of the effectiveness of compiler optimizations; and (e) measurements of the performance and design of cache memories.

In this paper, we extend the earlier work to parallel computers. We describe a portable benchmarking suite and performance prediction methodology, which accurately predicts the run times of Fortran 90 programs running upon supercomputers. The benchmarking suite measures the optimization capabilities of a given Fortran 90 compiler, execution rates of abstract Fortran 90 operations, and the processing characteristics of the underlying architecture as exposed by compiler-generated code. To predict the run time of an arbitrary program, we combine our benchmark results with dynamic execution measurements, and augment the resulting prediction with simple factors which account for overhead due to architecture-specific effects, such as remote reference latencies. We measure two supercomputers: a dedicated 128-node TMC CM-5, a distributed memory multiprocessor, and a 4-node partition of a Cray YMP-C90, a tightly-integrated shared memory multiprocessor. Our measurements show that the performance of the YMP-C90 far outstrips that of the CM-5, due to the quality of the compilers available and the architectural characteristics of each machine. To validate our prediction methodology, we predict the run time of five interesting kernels on these machines; nearly all of the predicted run times are within 50-percent of actual run times, much closer than might be expected.


BibTeX citation:

@techreport{Von Worley:CSD-95-873,
    Author = {Von Worley, Stephen J. and Smith, Alan Jay},
    Title = {Microbenchmarking and Performance Prediction for Parallel Computers},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1995},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1995/5631.html},
    Number = {UCB/CSD-95-873},
    Abstract = {Previous research on this project (in work by Saavedra and Smith) has presented performance evaluation of sequential computers. That work presented (a) measurements of machines at the source language primitive operation level; (b) analysis of standard benchmarks; (c) prediction of run times based on separate measurements of the machines and the programs; (d) analysis of the effectiveness of compiler optimizations; and (e) measurements of the performance and design of cache memories. <p>In this paper, we extend the earlier work to parallel computers. We describe a portable benchmarking suite and performance prediction methodology, which accurately predicts the run times of Fortran 90 programs running upon supercomputers. The benchmarking suite measures the optimization capabilities of a given Fortran 90 compiler, execution rates of abstract Fortran 90 operations, and the processing characteristics of the underlying architecture as exposed by compiler-generated code. To predict the run time of an arbitrary program, we combine our benchmark results with dynamic execution measurements, and augment the resulting prediction with simple factors which account for overhead due to architecture-specific effects, such as remote reference latencies. We measure two supercomputers: a dedicated 128-node TMC CM-5, a distributed memory multiprocessor, and a 4-node partition of a Cray YMP-C90, a tightly-integrated shared memory multiprocessor. Our measurements show that the performance of the YMP-C90 far outstrips that of the CM-5, due to the quality of the compilers available and the architectural characteristics of each machine. To validate our prediction methodology, we predict the run time of five interesting kernels on these machines; nearly all of the predicted run times are within 50-percent of actual run times, much closer than might be expected.}
}

EndNote citation:

%0 Report
%A Von Worley, Stephen J.
%A Smith, Alan Jay
%T Microbenchmarking and Performance Prediction for Parallel Computers
%I EECS Department, University of California, Berkeley
%D 1995
%@ UCB/CSD-95-873
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1995/5631.html
%F Von Worley:CSD-95-873