Design and Evaluation of Distributed Wide-Area On-line Archival Storage Systems

Hakim Weatherspoon

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2006-130

October 13, 2006

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-130.pdf

As the amount of digital assets increase, systems that ensure the durability, integrity, and accessibility of digital data become increasingly important. Distributed on-line archival storage systems are designed for this very purpose. This thesis explores several important challenges pertaining to fault tolerance, repair, and integrity that must be addressed to build such systems.

The first part of this thesis explores how to maintain durability via fault tolerance and repair and presents many insights on how to do so efficiently. Fault tolerance ensures that data is not lost due to server failure. Replication is the canonical solution for data fault tolerance. The challenge is knowing how many replicas to create and where to store them. Fault tolerance alone, however, is not sufficient to prevent data loss as the last replica will eventually fail. Thus, repair is required to replace replicas lost to failure. The system must monitor and detect server failure and create replicas in response. The problem is that not all server failure results in loss of data and the system can be tricked into creating replicas unnecessarily. The challenge is knowing when to create replicas. Both fault tolerance and repair are required to prevent the last replica from being lost, hence, maintain data durability.

The second part of this thesis explores how to ensure the integrity of data. Integrity ensures that the state of data stored in the system always reflects changes made by the owner. It includes non-repudiably binding owner to data and ensuring that only the owner can modify data, returned data is the same as stored, and the last write is returned in subsequent reads. The challenge is efficiency since requiring cryptography and consistency in the wide-area can easily be prohibitive.

Next, we exploit a secure log to efficiently ensure integrity. We demonstrate how the narrow interface of a secure, append-only log simplifies the design of distributed wide-area storage systems. The system inherits the security and integrity properties of the log. We describe how to replicate the log for increased durability while ensuring consistency among the replicas. We present a repair algorithm that maintains sufficient replication levels as machines fail. Finally, the design uses aggregation to improve efficiency. Although simple, this interface is powerful enough to implement a variety of interesting applications.

Finally, we apply the insights and architecture to a prototype called Antiquity. Antiquity efficiently maintains the durability and integrity of data. It has been running in the wide area on 400+ PlanetLab servers where we maintain the consistency, durability, and integrity of nearly 20,000 logs totaling more than 84 GB of data despite the constant churn of servers (a quarter of the servers experience a failure every hour).

Advisors: John D. Kubiatowicz

BibTeX citation:

@phdthesis{Weatherspoon:EECS-2006-130,
Author= {Weatherspoon, Hakim},
Title= {Design and Evaluation of Distributed Wide-Area On-line Archival Storage Systems},
School= {EECS Department, University of California, Berkeley},
Year= {2006},
Month= {Oct},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-130.html},
Number= {UCB/EECS-2006-130},
Abstract= {As the amount of digital assets increase, systems that ensure the durability, integrity, and accessibility of digital data become increasingly important. Distributed on-line archival storage systems are designed for this very purpose. This thesis explores several important challenges pertaining to fault tolerance, repair, and integrity that must be addressed to build such systems.

EndNote citation:

%0 Thesis
%A Weatherspoon, Hakim 
%T Design and Evaluation of Distributed Wide-Area On-line Archival Storage Systems
%I EECS Department, University of California, Berkeley
%D 2006
%8 October 13
%@ UCB/EECS-2006-130
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-130.html
%F Weatherspoon:EECS-2006-130