DCR: Replay-Debugging for the Datacenter
Gautam Altekar and Ion Stoica
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2010-33
March 21, 2010
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-33.pdf
We’ve built a tool for debugging non-deterministic failures in production datacenter applications. Our system, called DCR, is the first to efficiently record and replay large scale, distributed, and data-intensive systems such as HDFS/GFS, HBase/Bigtable, and Hadoop/MapReduce. The enabling idea behind DCR is that debugging doesn’t require a precise replica of the original datacenter run. Instead, it suffices to produce some run that exhibits the original control-plane behavior. This report details the design and implementation of DCR and provides preliminary results.
BibTeX citation:
@techreport{Altekar:EECS-2010-33, Author= {Altekar, Gautam and Stoica, Ion}, Title= {DCR: Replay-Debugging for the Datacenter}, Year= {2010}, Month= {Mar}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-33.html}, Number= {UCB/EECS-2010-33}, Abstract= {We’ve built a tool for debugging non-deterministic failures in production datacenter applications. Our system, called DCR, is the first to efficiently record and replay large scale, distributed, and data-intensive systems such as HDFS/GFS, HBase/Bigtable, and Hadoop/MapReduce. The enabling idea behind DCR is that debugging doesn’t require a precise replica of the original datacenter run. Instead, it suffices to produce some run that exhibits the original control-plane behavior. This report details the design and implementation of DCR and provides preliminary results.}, }
EndNote citation:
%0 Report %A Altekar, Gautam %A Stoica, Ion %T DCR: Replay-Debugging for the Datacenter %I EECS Department, University of California, Berkeley %D 2010 %8 March 21 %@ UCB/EECS-2010-33 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-33.html %F Altekar:EECS-2010-33