DCR: Replay-Debugging for the Datacenter

Gautam Altekar and Ion Stoica

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2010-33
March 21, 2010

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-33.pdf

We’ve built a tool for debugging non-deterministic failures in production datacenter applications. Our system, called DCR, is the first to efficiently record and replay large scale, distributed, and data-intensive systems such as HDFS/GFS, HBase/Bigtable, and Hadoop/MapReduce. The enabling idea behind DCR is that debugging doesn’t require a precise replica of the original datacenter run. Instead, it suffices to produce some run that exhibits the original control-plane behavior. This report details the design and implementation of DCR and provides preliminary results.


BibTeX citation:

@techreport{Altekar:EECS-2010-33,
    Author = {Altekar, Gautam and Stoica, Ion},
    Title = {DCR: Replay-Debugging for the Datacenter},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2010},
    Month = {Mar},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-33.html},
    Number = {UCB/EECS-2010-33},
    Abstract = {We’ve built a tool for debugging non-deterministic failures
in production datacenter applications. Our system,
called DCR, is the first to efficiently record and replay
large scale, distributed, and data-intensive systems such
as HDFS/GFS, HBase/Bigtable, and Hadoop/MapReduce.
The enabling idea behind DCR is that debugging
doesn’t require a precise replica of the original datacenter
run. Instead, it suffices to produce some run that exhibits
the original control-plane behavior. This report details
the design and implementation of DCR and provides preliminary
results.}
}

EndNote citation:

%0 Report
%A Altekar, Gautam
%A Stoica, Ion
%T DCR: Replay-Debugging for the Datacenter
%I EECS Department, University of California, Berkeley
%D 2010
%8 March 21
%@ UCB/EECS-2010-33
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-33.html
%F Altekar:EECS-2010-33