Long-Term Data Maintenance in Wide-Area Storage Systems: A Quantitative Approach

Hakim Weatherspoon, Byung-Gon Chun, Chiu Wah So and John Kubiatowicz

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-05-1404
July 2005

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/CSD-05-1404.pdf

Maintaining data replication levels is a fundamental process of wide-area storage systems; replicas must be created as storage nodes permanently fail to avoid data loss. Many failures in the wide-area are transient, however, where the node returns with data intact. Given a goal of minimizing replicas created to maintain a desired replication level, creating replicas in response to transient failures is wasted effort. In this paper, we present a principled way of minimizing costs while maintaining a desired data availability. Design choices include choosing data redundancy type, number of replicas, extra redundancy, and data placement. We demonstrate via trace-driven simulation that significant maintenance efficiency gains can be realized in existing storage systems with the correct choice of strategies and parameters. For example, we show that DHash can reduce its costs by a factor of 31 while maintaining the same desired data availability.


BibTeX citation:

@techreport{Weatherspoon:CSD-05-1404,
    Author = {Weatherspoon, Hakim and Chun, Byung-Gon and So, Chiu Wah and Kubiatowicz, John},
    Title = {Long-Term Data Maintenance in Wide-Area Storage Systems: A Quantitative Approach},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2005},
    Month = {Jul},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/6512.html},
    Number = {UCB/CSD-05-1404},
    Abstract = {Maintaining data replication levels is a fundamental process of wide-area storage systems; replicas must be created as storage nodes permanently fail to avoid data loss. Many failures in the wide-area are transient, however, where the node returns with data intact. Given a goal of minimizing replicas created to maintain a desired replication level, creating replicas in response to transient failures is wasted effort. In this paper, we present a principled way of minimizing costs while maintaining a desired data availability. Design choices include choosing data redundancy type, number of replicas, extra redundancy, and data placement. We demonstrate via trace-driven simulation that significant maintenance efficiency gains can be realized in existing storage systems with the correct choice of strategies and parameters. For example, we show that DHash can reduce its costs by a factor of 31 while maintaining the same desired data availability.}
}

EndNote citation:

%0 Report
%A Weatherspoon, Hakim
%A Chun, Byung-Gon
%A So, Chiu Wah
%A Kubiatowicz, John
%T Long-Term Data Maintenance in Wide-Area Storage Systems: A Quantitative Approach
%I EECS Department, University of California, Berkeley
%D 2005
%@ UCB/CSD-05-1404
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/6512.html
%F Weatherspoon:CSD-05-1404