Hardware-Assisted Replay of Multiprocessor Programs

David F. Bacon and Seth Copen Goldstein

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-91-624
August 1991

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1991/CSD-91-624.pdf

Shared-memory parallel programs can be highly non-deterministic due to the unpredictable order in which shared references are satisfied. However, deterministic execution is extremely important for debugging and can also be used for fault-tolerance and other replay-based algorithms. We present a hardware/software design that allows the order of memory references in a parallel program to be logged efficiently by recording a subset of the cache traffic between memory and the CPU's. This log can then be used along with hardware and software control to replay execution.

Simulation of several parallel programs shows that our device records no more than 1.17 MB/second for an application exhibiting fine-grained sharing behavior on a 16-way multiprocessor consisting of 12 MIP CPU's. In addition, no probe effect or performance degradation is introduced. This represents several orders of magnitude improvement in both performance and log size over purely software-based methods proposed previously.


BibTeX citation:

@techreport{Bacon:CSD-91-624,
    Author = {Bacon, David F. and Goldstein, Seth Copen},
    Title = {Hardware-Assisted Replay of Multiprocessor Programs},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1991},
    Month = {Aug},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1991/6401.html},
    Number = {UCB/CSD-91-624},
    Abstract = {Shared-memory parallel programs can be highly non-deterministic due to the unpredictable order in which shared references are satisfied. However, deterministic execution is extremely important for debugging and can also be used for fault-tolerance and other replay-based algorithms. We present a hardware/software design that allows the order of memory references in a parallel program to be logged efficiently by recording a subset of the cache traffic between memory and the CPU's. This log can then be used along with hardware and software control to replay execution. <p>Simulation of several parallel programs shows that our device records no more than 1.17 MB/second for an application exhibiting fine-grained sharing behavior on a 16-way multiprocessor consisting of 12 MIP CPU's. In addition, no probe effect or performance degradation is introduced. This represents several orders of magnitude improvement in both performance and log size over purely software-based methods proposed previously.}
}

EndNote citation:

%0 Report
%A Bacon, David F.
%A Goldstein, Seth Copen
%T Hardware-Assisted Replay of Multiprocessor Programs
%I EECS Department, University of California, Berkeley
%D 1991
%@ UCB/CSD-91-624
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1991/6401.html
%F Bacon:CSD-91-624