Incremental Checkpointing with Application to Distributed Discrete Event Simulation

Thomas Huining Feng and Edward A. Lee

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2006-37
April 9, 2006

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-37.pdf

Checkpointing is widely used in robust fault-tolerant applications. We present an efficient incremental checkpointing mechanism. It requires to record only the the state changes and not the complete state. After the creation of a checkpoint, state changes are logged incrementally as records in memory, with which an application can spontaneously roll back later. This incrementality allows us to implement checkpointing with high performance. Only small constant time is required for checkpoint creation and state recording. Rollback requires linear time in the number of recorded state changes, which is bounded by the number of state variables times the number of checkpoints. We implement a Java source transformer that automatically converts an existing application into a behavior-preserving one with checkpointing functionality. This transformation is application-independent and application-transparent. A wide range of applications can benefit from this technique. Currently, it has been used for distributed discrete event simulation using the Time Warp technique.

Author Comments: See the published version


BibTeX citation:

@techreport{Feng:EECS-2006-37,
    Author = {Feng, Thomas Huining and Lee, Edward A.},
    Title = {Incremental Checkpointing with Application to Distributed Discrete Event Simulation},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2006},
    Month = {Apr},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-37.html},
    Number = {UCB/EECS-2006-37},
    Note = {See the <a href="http://ptolemy.eecs.berkeley.edu/publications/papers/06/incrementalCheckpointing_WSC06/index.htm">published version</a>},
    Abstract = {Checkpointing is widely used in robust fault-tolerant applications. We present an efficient incremental checkpointing mechanism. It requires to record only the the state changes and not the complete state. After the creation of a checkpoint, state changes are logged incrementally as records in memory, with which an application can spontaneously roll back later. This incrementality allows us to implement checkpointing with high performance. Only small constant time is required for checkpoint creation and state recording. Rollback requires linear time in the number of recorded state changes, which is bounded by the number of state variables times the number of checkpoints. We implement a Java source transformer that automatically converts an existing application into a behavior-preserving one with checkpointing functionality. This transformation is application-independent and application-transparent. A wide range of applications can benefit from this technique. Currently, it has been used for distributed discrete event simulation using the Time Warp technique.}
}

EndNote citation:

%0 Report
%A Feng, Thomas Huining
%A Lee, Edward A.
%T Incremental Checkpointing with Application to Distributed Discrete Event Simulation
%I EECS Department, University of California, Berkeley
%D 2006
%8 April 9
%@ UCB/EECS-2006-37
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-37.html
%F Feng:EECS-2006-37