What is RAID 5?

RAID 5 is a block-based disk striping mechanism, a RAID 5 disk array consists with only block striping and distributed parity:

RAID 5

As we can see from the above diagram, a file sized of 9 disk blocks is written to a N drives RAID 5 array (N=4 in this example). The storage controller first divides the file into stripes of (N-1) blocks each, then it calculates a parity block for each stripe. Afterwards, each stripe of N blocks (N-1 data blocks + 1 parity block) is written evenly into the N drives available in this array. This makes RAID 5 a higher performance, and more cost effective (drive utilisation) secure storage option especially suitable for smaller studios and SOHO environment.

Rebuilding a Degraded RAID 5 Array

RAID 5 array provides data redundancy only if all drives are working normally, this RAID level has a maximum fault tolerance of 1 drive, no matter how big the array is.

Whenever there is a single drive failure event, the entire RAID 5 array is in degraded status, where no data protection is remained. If this happens, we need to rebuild this array with a new replacement drive. The data protection only comes back when the rebuilding is finished.

Let's simplify the rebuilding process: We assume the mathematics algorithm behind RAID 5 parity block is plus:

  • Ap = A1+A2+A3
  • Bp = B1+B2+B3
  • Cp = C1+C2+C3
  • Dp = D1+D2+D3

If Disk 1 in the above diagram fails, the given RAID 5 array then becomes:

RAID 5

The corresponding rebuilding process of Disk 1 is essentially calculating each block of the failed drive on a per stripe manner (A, B, C and D):

  • A2 = Ap - A1 - A3
  • B2 = Bp - B1 - B3
  • Cp = C1 + C2 + C3
  • D1 = Dp - D2 - D3

The rebuilding seems simple; however, this process is actually dangerous and risky. Many cases indicate that a second drive is very likely to fail during the rebuilding of the first failed drive, which means, all data is history.

Risk 1: AFR

Based on a paper published by Google in FAST (File and Storage Technology) 07 conference, the annualised failure rate (AFR) of aged 5 hard disk is 8.6%.

The time needed for rebuilding a 10TB disk at 100MB/s is:

RAID 5

The probability of a second drive fails in the rebuild time of 18.52 hours, based on AFR figure, is:

RAID 5

Based on binomial distribution,

RAID 5

The probability of a second drive failure in a five-year-old array during the RAID 5 rebuilding is:

RAID 5

Risk 2 - BER

Bit error rate (BER) in storage systems refers to the rate at which a block just cannot be read from the disk, due to not being able to recover data from the PRML (Partial-response maximum-likelihood) and ECC (error correct code) codes on the platter. The entire disk does not necessarily fail, but the read operation to certain block/s cannot be completed. In a RAID environment, this type of failure triggers a reconstruction of the block from the remaining disks. We can assume that B2 block written to Disk 1 cannot be retrieved correctly due to BER. This block’s data can only be reconstructed from blocks B1, B3 and Bp from other drives

  • B2 = Bp - B1 - B3

RAID 5

BER leads to unrecoverable read error (URE), which is normally documented by the disk manufacture. When URE happens, there is no other way than restriping to recover the data on the errored block – it relies on RAID.

In general:

  • Enterprise SAS disk – URE = 1e-15
  • SATA disk – URE = 1e-14

Let’s look at this figure, for a ordinary SATA disk, the probability of UAE during a accumulated reading of 100TB data are:

  • 0 URE incident happening:

RAID 5

  • 1 URE incident happening:

RAID 5

  • 2 URE incidents happening:

RAID 5

If URE happens during the rebuilding of a failed drive:

  • Disk 1 failed
  • Disk 2’s C2 block URE

RAID 5

There is no way to recalculate both Cp and C2’s data only based on C1 and C3 blocks. The entire C stripe is gone!

Then is the last question: how likely that URE would fail the RAID 5 cluster rebuilding? Based on a calculation carried out in a blog (all results are verified by myself):

SATA disk (URE = 1e-14)

RAID 5

Enterprise SAS disk (URE = 1e-15)

RAID 5

Recommendations

In conclusion, for consumer faced SATA drives, the rebuilding successful rate is very low even for 4 bay SATA 5 array with 1TB disks. It’s nearly impossible to guarantee a successful rebuilding with this type of disk because of its too high URE. Another alarming fact about URE is: it has no correlation to the drive’s age.

References

https://static.googleusercontent.com/media/research.google.com/en//archive/disk_failures.pdf

https://permabit.wordpress.com/2008/08/20/are-fibre-channel-and-scsi-drives-more-reliable/

https://www.seagate.com/au/en/tech-insights/advanced-format-4k-sector-hard-drives-master-ti/

https://en.wikipedia.org/wiki/RAID

https://en.wikipedia.org/wiki/Bit_error_rate