Long-Term Data Maintenance in Wide-Area Storage Systems: A Quantitative Approach
Hakim Weatherspoon and Byung-Gon Chun and Chiu Wah So and John Kubiatowicz
EECS Department, University of California, Berkeley
Technical Report No. UCB/CSD-05-1404
, 2005
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/CSD-05-1404.pdf
Maintaining data replication levels is a fundamental process of wide-area storage systems; replicas must be created as storage nodes permanently fail to avoid data loss. Many failures in the wide-area are transient, however, where the node returns with data intact. Given a goal of minimizing replicas created to maintain a desired replication level, creating replicas in response to transient failures is wasted effort. In this paper, we present a principled way of minimizing costs while maintaining a desired data availability. Design choices include choosing data redundancy type, number of replicas, extra redundancy, and data placement. We demonstrate via trace-driven simulation that significant maintenance efficiency gains can be realized in existing storage systems with the correct choice of strategies and parameters. For example, we show that DHash can reduce its costs by a factor of 31 while maintaining the same desired data availability.
BibTeX citation:
@techreport{Weatherspoon:CSD-05-1404, Author= {Weatherspoon, Hakim and Chun, Byung-Gon and So, Chiu Wah and Kubiatowicz, John}, Title= {Long-Term Data Maintenance in Wide-Area Storage Systems: A Quantitative Approach}, Year= {2005}, Month= {Jul}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/6512.html}, Number= {UCB/CSD-05-1404}, Abstract= {Maintaining data replication levels is a fundamental process of wide-area storage systems; replicas must be created as storage nodes permanently fail to avoid data loss. Many failures in the wide-area are transient, however, where the node returns with data intact. Given a goal of minimizing replicas created to maintain a desired replication level, creating replicas in response to transient failures is wasted effort. In this paper, we present a principled way of minimizing costs while maintaining a desired data availability. Design choices include choosing data redundancy type, number of replicas, extra redundancy, and data placement. We demonstrate via trace-driven simulation that significant maintenance efficiency gains can be realized in existing storage systems with the correct choice of strategies and parameters. For example, we show that DHash can reduce its costs by a factor of 31 while maintaining the same desired data availability.}, }
EndNote citation:
%0 Report %A Weatherspoon, Hakim %A Chun, Byung-Gon %A So, Chiu Wah %A Kubiatowicz, John %T Long-Term Data Maintenance in Wide-Area Storage Systems: A Quantitative Approach %I EECS Department, University of California, Berkeley %D 2005 %@ UCB/CSD-05-1404 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/6512.html %F Weatherspoon:CSD-05-1404