How RAID-5 really works

What is RAID?

RAID is an acronym for “Redundant Array of Independent Drives,” or “Redundant Array of Inexpensive Drives.” The main concept of RAID is the ability to take multiple drives and have them virtualised as a single drive. There are many different RAID structures, all of them obtain one of two primary purposes: aggregated storage space or data redundancy (as in, protection against data loss in the even of hard drive failure). I’m not going into the details of all the RAID levels as this post is more geared towards the lower level workings of RAID-5. If you wish to learn about all the RAID levels, see this Wikipedia Article On RAID.

What is RAID 5?

RAID 5 provides a very redundant fault tolerance in addition to performance advantages allowing data to be safeguarded while only sacrificing the equivalent of one drive’s space. RAID-5 requires at least three hard drives of the same size; The total storage space available with a RAID-5 array is equal to { (number of drives – 1) * size of smallest drive }. So if you use three 120gb hard drives, you will have 240gb of actual usable space. If you use five 120gb hard drives, you would have 480gb of usable space. The more drives you use, the more efficient your storage space becomes without losing any redundancy.

Data Redundancy

Your data can survive a complete failure of one hard drive, however if two drives fail at the same time, ALL data will be lost. It is very important to have an extra drive on hand so if a drive fails, you can replace it immediately for data rebuild. The RAID-5 array can actually still be used with one drive completely missing or not working, but performance is degraded as the data must be rebuilt on the fly. However, if you do not have an extra drive to plug in right away when one fails, it would be wise to keep the computer and all drives powered off until you can replace the failed drive. You may think, “oh it will only be a couple days before the new drive arrives,” but ask yourself this: Is not having access to the data on these drives for only a couple days worse than taking the risk of losing it all forever if another drive happens to fail? Probably not.

Raid 5
Figure 1. Representation of RAID-5 data structure.

Striping & Parity

Data is “striped” across the hard drives, with a dedicated parity block for each stripe. A, B, C, and D represent data “stripes.” Each stripe segment per drive can vary in size; I believe anywhere from 4kb to 256kb per stripe is normal and can be set during setup to adjust performance. The blocks with a subscript P are the parity blocks which are a representation of the sum of all other blocks in that stripe (explained in more detail below). The parity is responsible for the data fault tolerance and is also the reason why you lose the amount of space equivalent to one drive. Taking notice of figure 1, let’s say that the second drive fails. When a new hard drive is put in its place the RAID controller would rebuild the data automatically. The data in segments A1 and A3 would be compared to the AP parity block, which would allow the data for A2 to be rebuilt. This would take place on each stripe until the entire drive is up to speed, so to speak. Parity blocks are determined by using a logical comparison called XOR (Exclusive OR) on binary blocks of data which will be explained down further.

Performance

RAID 5 offers accelerated read performance because the data stream is accessed from multiple drives at the same time. Referring to figure 1, let’s say that stripe A was a single file. Normally on a single drive when you open that file, the whole thing would be streamed from the one hard drive bit by bit – thus the one hard drive’s max read speed is going to become a bottleneck. BUT, with a RAID-5, that one file can be accessed in 1/3 of the time because it will be read from all 3 drives at once; block 1 has the first 1/3 of the file, block 2 has the second 1/3 section of the file, and the block 3 has the last part of the file. This, in a perfect situation, causes your read speed to be tripled – with even more performance potential in RAID-5 arrays containing additional hard drives!

The downfall to this is that there is an increased overhead when writing to the drives caused from parity calculation. Every single bit written to the drives must be compared and processed to create a parity block. If your intended use involves a lot of data writing (such as video recording, high traffic server, etc) RAID-5 would not be the most ideal choice.

XOR Comparison

Data is stored and processed at the very lowest levels in the form of binary which is of course 0s and 1s. There are methods of comparing binary bits called operators. The one that does the magic of parity creation is called XOR, or Exclusive OR. If you have experience in lower level programming or electronics, you probably already know what an XOR is.

XOR Input-Output
Figure 2. XOR Inputs/Outputs

Basically, an XOR comparison will take two binary bits, compare them, and output a result of 0 or 1. It will return a 1 ONLY IF the two inputs are different. If both bits are 0, the output is 0; If both bits are 1, the output is 0; If one bit is 0 and the other bit is 1, the output is 1.

Parity Examples
Figure 3. Yellow cells represent parity blocks.

Building Parity

For easier understanding/explaining, we are only going to be working with 4-bit blocks. Actual data blocks can range from 4kb (32,768 bits) up to 256kb (2,097,152 bits), but the method is exactly the same regardless of how many consecutive bits you work with. In figure 3, the yellow blocks represent the parities for each stripe. As you may notice, the parities are distributed evenly between all drives. This provides a slight increase in performance and is what separates RAID-4 from RAID-5 (RAID 4 keeps all parities on a single drive).

Lets examine the first stripe of figure 3. To compute the parity, we must run the XOR comparison on each block of data in that stripe. You XOR the first two blocks, then take the result and XOR it against the third block (and continue this for all drives in the array – except for the block where the parity will be stored).

(Drive 1) XOR (Drive 2) = (0100) XOR (0101) = (0001)
(Result) XOR (Drive 3) = (0001) XOR (0010) = (0011)

Let me break that down a little more in case you couldn’t follow. Refer to figure 2 if you have trouble remembering the inputs/outputs for XOR

First we need to compare the first two drives’ blocks which are 0100 and 0101. The very first bit comparison is 0 and 0 (the first bits from both blocks) which results 0 – the first bit of our temporary parity. The second set of bits are 1 and 1 which results 0. So far our temporary parity is 00. Now the third bit comparison is 0 and 0 which returns 0. We are now at 000. The fourth bit comparison is 0 and 1 which results 1. So the result of (Drive 1)XOR(Drive 2) is 0001. We now must take this block, and compare it to drive 3 which is 0010. The XOR of 0001 and 0010 equals 0011 – the parity for stripe 1!

Recovering Data

The very cool thing about XOR comparisons – and what makes RAID 5 possible – is that if one value comes up missing, you can always find the missing value by doing an XOR comparison on the remaining values! Referring back to figure 3, let’s say that drive 1 fails. The user will be prompted by the raid controller and alerted that a drive has failed and must be replaced. As soon as a new drive is put in, the controller will automatically start rebuilding the lost data. Here is how we rebuild drive 1, stripe 1

(Drive 2) XOR (Drive 3) = (0101) XOR (0010) = (0111)
(Result) XOR (Drive 4) = (0111) XOR (0011) = (0100)

As you can see, the final result is 0100. Now refer back to figure 3 at drive 1, stripe 1…. sure enough, its 0100! Amazingly, right? Just for fun, let’s rebuild stripe 2 as well with the assumption that it is drive 1 that has failed.

(Drive 2) XOR (Drive 3) = (0000) XOR (0110) = (0110)
(Result) XOR (Drive 4) = (0110) XOR (0100) = (0010)

The missing block was calculated as 0010. Take a look at figure 3 to verify what drive 1, stripe 2 was before the failure and see if it matches the computed value… of course it does!

Well I hope you have enjoyed this post. It took me a great deal of searching to finally find the answers about how this works when my own curiosity got to me. I has trouble finding any websites that explained all of the details so I decided to write this article with the hope that it might satisfy the curiosity of others!