Focus Replay Debugging Effort On the Control Plane

Gautam Altekar and Ion Stoica

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2010-88
May 29, 2010

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-88.pdf

Replay debugging systems enable the reproduction and debugging of non-deterministic failures in production application runs. However, no existing replay system is suitable for datacenter applications like Cassandra, Hadoop, and Hypertable. On these large scale, distributed, and data intensive programs, existing replay methods either incur excessive production recording overheads or are unable to provide high fidelity replay.

In this position paper, we hypothesize and empirically verify that control plane determinism is the key to recordefficient and high-fidelity replay of datacenter applications. The key idea behind control plane determinism is that debugging does not always require a precise replica of the original application run. Instead, it often suffices to produce some run that exhibits the original behavior of the control-plane–the application code responsible for controlling and managing data flow through a datacenter system.


BibTeX citation:

@techreport{Altekar:EECS-2010-88,
    Author = {Altekar, Gautam and Stoica, Ion},
    Title = {Focus Replay Debugging Effort On the Control Plane},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2010},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-88.html},
    Number = {UCB/EECS-2010-88},
    Abstract = {Replay debugging systems enable the reproduction and
debugging of non-deterministic failures in production
application runs. However, no existing replay system
is suitable for datacenter applications like Cassandra,
Hadoop, and Hypertable. On these large scale,
distributed, and data intensive programs, existing replay
methods either incur excessive production recording
overheads or are unable to provide high fidelity replay.

In this position paper, we hypothesize and empirically
verify that control plane determinism is the key to recordefficient
and high-fidelity replay of datacenter applications.
The key idea behind control plane determinism is
that debugging does not always require a precise replica
of the original application run. Instead, it often suffices
to produce some run that exhibits the original behavior
of the control-plane–the application code responsible for
controlling and managing data flow through a datacenter
system.}
}

EndNote citation:

%0 Report
%A Altekar, Gautam
%A Stoica, Ion
%T Focus Replay Debugging Effort On the Control Plane
%I EECS Department, University of California, Berkeley
%D 2010
%8 May 29
%@ UCB/EECS-2010-88
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-88.html
%F Altekar:EECS-2010-88