RAID-CUBE: The Modern Datacenter Case for RAID

Jayanta Basak and Randy H. Katz

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2015-4

February 5, 2015

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-4.pdf

“Big Data” processing in modern datacenters dramatically increases the data volume moving between applications and storage. A major challenge is achieving acceptable levels of availability and reliability in an environment characterized by huge storage capacities, large numbers of disk drives, and very high interconnection bandwidth (e.g., 100 petabytes and 17000 disk drives at CERN ). In this paper, we show that existing RAID mechanisms are insufficient, and that the mean time to data loss (MTTDL) falls drastically as the number of disks and data volume increase. We introduce a new high availability storage configuration, which we call RAID-CUBE, and show that it is more resilient to data loss as the datacenter scales in capacity than existing RAID dual parity and triple parity schemes. We also identify the limits to capacity of a datacenter (in terms of the number of disks) to maintain an acceptable MTTDL for different data protection mechanisms. Finally, we briefly introduce an effective mechanism for bit error protection for large sequential IOs in this environment.

BibTeX citation:

@techreport{Basak:EECS-2015-4,
    Author= {Basak, Jayanta and Katz, Randy H.},
    Title= {RAID-CUBE: The Modern Datacenter Case for RAID},
    Year= {2015},
    Month= {Feb},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-4.html},
    Number= {UCB/EECS-2015-4},
    Abstract= {“Big Data” processing in modern datacenters dramatically increases the data volume moving between applications and storage. A major challenge is achieving acceptable levels of availability and reliability in an environment characterized by huge storage capacities, large numbers of disk drives, and very high interconnection bandwidth (e.g., 100 petabytes and 17000 disk drives at CERN ). In this paper, we show that existing RAID mechanisms are insufficient, and that the mean time to data loss (MTTDL) falls drastically as the number of disks and data volume increase. We introduce a new high availability storage configuration, which we call RAID-CUBE, and show that it is more resilient to data loss as the datacenter scales in capacity than existing RAID dual parity and triple parity schemes. We also identify the limits to capacity of a datacenter (in terms of the number of disks) to maintain an acceptable MTTDL for different data protection mechanisms. Finally, we briefly introduce an effective mechanism for bit error protection for large sequential IOs in this environment.},
}

EndNote citation:

%0 Report
%A Basak, Jayanta 
%A Katz, Randy H. 
%T RAID-CUBE: The Modern Datacenter Case for RAID
%I EECS Department, University of California, Berkeley
%D 2015
%8 February 5
%@ UCB/EECS-2015-4
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-4.html
%F Basak:EECS-2015-4