Redundant Disk Arrays: Reliable, Parallel Secondary Storage

Garth Alan Gibson

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-91-613
December 1990

During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems but, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant Arrays of Inexpensive Disks (RAID) use simple redundancy schemes to provide high data reliability. This dissertation investigates the data encoding, performance, and reliability of redundant disk arrays.

Organizing redundant data into a disk array is treated as a coding problem in this dissertation. Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures.

The performance advantages of striping data across multiple disks are reviewed in this dissertation. For large transfers this parallelism reduces response time. Striping data also automatically distributes independent, small accesses across disks to increase throughput. This dissertation evaluates the performance lost to the maintenance of redundant data. This loss is negligible for large transfers but can be significant for small writes because of increases in aggregate disk service time.

Because disk arrays include redundancy to protect against the high failure rates caused by large numbers of disk components, it is crucial that disk failures be characterized. This dissertation provides evidence that disk lifetimes can be modeled as exponential random variables.

Building on an exponential model for disk lifetimes, this dissertation presents analytic models for disk-array lifetime, evaluates these against event-drived simulation, and applies them to an example redundant disk array. These models incorporate the effects of independent and dependent disk failures (shared support hardware) as well as the effects of on-line spare disks. For the example redundant disk array, these models show that a 10% overhead for an N+1-parity encoding plus a 10% overhead for on-line spares can provide higher reliability than the 100% overhead of conventional mirrored disks.

Advisor: David A. Patterson


BibTeX citation:

@phdthesis{Gibson:CSD-91-613,
    Author = {Gibson, Garth Alan},
    Title = {Redundant Disk Arrays: Reliable, Parallel Secondary Storage},
    School = {EECS Department, University of California, Berkeley},
    Year = {1990},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/6373.html},
    Number = {UCB/CSD-91-613},
    Abstract = {During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems but, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant Arrays of Inexpensive Disks (RAID) use simple redundancy schemes to provide high data reliability. This dissertation investigates the data encoding, performance, and reliability of redundant disk arrays. <p> Organizing redundant data into a disk array is treated as a coding problem in this dissertation.  Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures. <p> The performance advantages of striping data across multiple disks are reviewed in this dissertation. For large transfers this parallelism reduces response time. Striping data also automatically distributes independent, small accesses across disks to increase throughput. This dissertation evaluates the performance lost to the maintenance of redundant data. This loss is negligible for large transfers but can be significant for small writes because of increases in aggregate disk service time. <p> Because disk arrays include redundancy to protect against the high failure rates caused by large numbers of disk components, it is crucial that disk failures be characterized. This dissertation provides evidence that disk lifetimes can be modeled as exponential random variables. <p> Building on an exponential model for disk lifetimes, this dissertation presents analytic models for disk-array lifetime, evaluates these against event-drived simulation, and applies them to an example redundant disk array. These models incorporate the effects of independent and dependent disk failures (shared support hardware) as well as the effects of on-line spare disks. For the example redundant disk array, these models show that a 10% overhead for an N+1-parity encoding plus a 10% overhead for on-line spares can provide higher reliability than the 100% overhead of conventional mirrored disks.}
}

EndNote citation:

%0 Thesis
%A Gibson, Garth Alan
%T Redundant Disk Arrays: Reliable, Parallel Secondary Storage
%I EECS Department, University of California, Berkeley
%D 1990
%@ UCB/CSD-91-613
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/6373.html
%F Gibson:CSD-91-613