Scott Klarr Jr
How RAID-5 really works
What is RAID?
RAID is an acronym for "Redundant Array of Independent Drives," or "Redundant Array of Inexpensive Drives." The main concept of RAID is the ability to take multiple drives and have them virtualised as a single drive. There are many different RAID structures, all of them obtain one of two primary purposes: aggregated storage space or data redundancy (as in, protection against data loss in the even of hard drive failure). I'm not going into the details of all the RAID levels as this post is more geared towards the lower level workings of RAID-5. If you wish to learn about all the RAID levels, see this Wikipedia Article On RAID.
What is RAID 5?
RAID 5 provides a very redundant fault tolerance in addition to performance advantages allowing data to be safeguarded while only sacrificing the equivalent of one drive's space. RAID-5 requires at least three hard drives of the same size; The total storage space available with a RAID-5 array is equal to { (number of drives - 1) * size of smallest drive }. So if you use three 120gb hard drives, you will have 240gb of actual usable space. If you use five 120gb hard drives, you would have 480gb of usable space. The more drives you use, the more efficient your storage space becomes without losing any redundancy.
Data Redundancy
Your data can survive a complete failure of one hard drive, however if two drives fail at the same time, ALL data will be lost. It is very important to have an extra drive on hand so if a drive fails, you can replace it immediately for data rebuild. The RAID-5 array can actually still be used with one drive completely missing or not working, but performance is degraded as the data must be rebuilt on the fly. However, if you do not have an extra drive to plug in right away when one fails, it would be wise to keep the computer and all drives powered off until you can replace the failed drive. You may think, "oh it will only be a couple days before the new drive arrives," but ask yourself this: Is not having access to the data on these drives for only a couple days worse than taking the risk of losing it all forever if another drive happens to fail? Probably not.

Figure 1. Representation of RAID-5 data structure.
Striping & Parity
Data is "striped" across the hard drives, with a dedicated parity block for each stripe. A, B, C, and D represent data "stripes." Each stripe segment per drive can vary in size; I believe anywhere from 4kb to 256kb per stripe is normal and can be set during setup to adjust performance. The blocks with a subscript P are the parity blocks which are a representation of the sum of all other blocks in that stripe (explained in more detail below). The parity is responsible for the data fault tolerance and is also the reason why you lose the amount of space equivalent to one drive. Taking notice of figure 1, let's say that the second drive fails. When a new hard drive is put in its place the RAID controller would rebuild the data automatically. The data in segments A1 and A3 would be compared to the AP parity block, which would allow the data for A2 to be rebuilt. This would take place on each stripe until the entire drive is up to speed, so to speak. Parity blocks are determined by using a logical comparison called XOR (Exclusive OR) on binary blocks of data which will be explained down further.
Performance
RAID 5 offers accelerated read performance because the data stream is accessed from multiple drives at the same time. Referring to figure 1, let's say that stripe A was a single file. Normally on a single drive when you open that file, the whole thing would be streamed from the one hard drive bit by bit - thus the one hard drive's max read speed is going to become a bottleneck. BUT, with a RAID-5, that one file can be accessed in 1/3 of the time because it will be read from all 3 drives at once; block 1 has the first 1/3 of the file, block 2 has the second 1/3 section of the file, and the block 3 has the last part of the file. This, in a perfect situation, causes your read speed to be tripled - with even more performance potential in RAID-5 arrays containing additional hard drives!
The downfall to this is that there is an increased overhead when writing to the drives caused from parity calculation. Every single bit written to the drives must be compared and processed to create a parity block. If your intended use involves a lot of data writing (such as video recording, high traffic server, etc) RAID-5 would not be the most ideal choice.
XOR Comparison
Data is stored and processed at the very lowest levels in the form of binary which is of course 0s and 1s. There are methods of comparing binary bits called operators. The one that does the magic of parity creation is called XOR, or Exclusive OR. If you have experience in lower level programming or electronics, you probably already know what an XOR is.

Figure 2. XOR Inputs/Outputs
Basically, an XOR comparison will take two binary bits, compare them, and output a result of 0 or 1. It will return a 1 ONLY IF the two inputs are different. If both bits are 0, the output is 0; If both bits are 1, the output is 0; If one bit is 0 and the other bit is 1, the output is 1.

Figure 3. Yellow cells represent parity blocks.
Building Parity
For easier understanding/explaining, we are only going to be working with 4-bit blocks. Actual data blocks can range from 4kb (32,768 bits) up to 256kb (2,097,152 bits), but the method is exactly the same regardless of how many consecutive bits you work with. In figure 3, the yellow blocks represent the parities for each stripe. As you may notice, the parities are distributed evenly between all drives. This provides a slight increase in performance and is what separates RAID-4 from RAID-5 (RAID 4 keeps all parities on a single drive).
Lets examine the first stripe of figure 3. To compute the parity, we must run the XOR comparison on each block of data in that stripe. You XOR the first two blocks, then take the result and XOR it against the third block (and continue this for all drives in the array - except for the block where the parity will be stored).
(Drive 1) XOR (Drive 2) = (0100) XOR (0101) = (0001)
(Result) XOR (Drive 3) = (0001) XOR (0010) = (0011)
Let me break that down a little more in case you couldn't follow. Refer to figure 2 if you have trouble remembering the inputs/outputs for XOR
First we need to compare the first two drives' blocks which are 0100 and 0101. The very first bit comparison is 0 and 0 (the first bits from both blocks) which results 0 - the first bit of our temporary parity. The second set of bits are 1 and 1 which results 0. So far our temporary parity is 00. Now the third bit comparison is 0 and 0 which returns 0. We are now at 000. The fourth bit comparison is 0 and 1 which results 1. So the result of (Drive 1)XOR(Drive 2) is 0001. We now must take this block, and compare it to drive 3 which is 0010. The XOR of 0001 and 0010 equals 0011 - the parity for stripe 1!
Recovering Data
The very cool thing about XOR comparisons - and what makes RAID 5 possible - is that if one value comes up missing, you can always find the missing value by doing an XOR comparison on the remaining values! Referring back to figure 3, let's say that drive 1 fails. The user will be prompted by the raid controller and alerted that a drive has failed and must be replaced. As soon as a new drive is put in, the controller will automatically start rebuilding the lost data. Here is how we rebuild drive 1, stripe 1
(Drive 2) XOR (Drive 3) = (0101) XOR (0010) = (0111)
(Result) XOR (Drive 4) = (0111) XOR (0011) = (0100)
As you can see, the final result is 0100. Now refer back to figure 3 at drive 1, stripe 1.... sure enough, its 0100! Amazingly, right? Just for fun, let's rebuild stripe 2 as well with the assumption that it is drive 1 that has failed.
(Drive 2) XOR (Drive 3) = (0000) XOR (0110) = (0110)
(Result) XOR (Drive 4) = (0110) XOR (0100) = (0010)
The missing block was calculated as 0010. Take a look at figure 3 to verify what drive 1, stripe 2 was before the failure and see if it matches the computed value... of course it does!
Well I hope you have enjoyed this post. It took me a great deal of searching to finally find the answers about how this works when my own curiosity got to me. I has trouble finding any websites that explained all of the details so I decided to write this article with the hope that it might satisfy the curiosity of others!
This topic has the following tags:
Last 5 Linkbacks
- Jan 12, 2009mysystem.org
- May 27, 2008www.lockergnome.com
- May 20, 2008photocamel.com



Joe Dec 12, 2007
That is cool!
rob Jan 06, 2008
nice one!
Dave Feb 17, 2008
As the number of drives (and the size of the drives) goes up the chances of a disk failure also go up. At some stage, the chance of two disks failing in the time it takes to replace the first disk approaches 1.
We have a few machines at my work with 48 disk, each holding 500GB. (You can get these boxes now with Terabyte disks...) and if we just made the whole thing a big RAID 5 array then the failure rate would be too scary. As it is, we have some RAID 5 in there along with some RAID 1 0 but we also have hot-spare disks in the machine that are unused so that if any single disk fails we can restore the RAID 5 array back to full health immediately. We keep one hot-spare for every RAID array in the box.
Just something to keep in mind when you are setting your RAID arrays up. Having a couple of cold spares on hand is not such a bad idea either.
craig cowan Jun 02, 2008
for the first time in a week of trying to get my head fully round this i finaly did thanks to this document!!!! Thankyou!! :)
Binoy Nicholas Jun 07, 2008
To say the least,Fantastic!! Such precise and simple it took only 10 minutes for a novice like me to conceptualize the scenario.. thanks Scott for this nice stuff..
Blue Jun 16, 2008
if we update a single 512 byte block on one disk - will the parity be recalculated for just that block or for the whole stripe block (3 x 32KiB stripe size)
reason I ask is I have seen RAID5 performance drop to around 1MiB/s when cache is saturated which corresponds to reading around 64 blocks
scott klarr Jun 17, 2008
Blue, I beleive that any change, even if only a single binary difference, will require that whole strip to be recalculated.
Anand Aug 04, 2008
What happens when there is a data corrutption, let say disk 2. We can see that the parity does not add up, but how do we know which disk the error actually is in ?
Sahkan Aug 24, 2008
Thanks for writing the article, Great Jop !
chandra Sep 12, 2008
if there are 10 hdd in raid 5 array, will data loss if 2 hdd fail same time?
Scott Klarr Apr 25, 2009
Yes; if you lost two drives at once with RAID-5, you would lose all data. That is why it is crucial to have at least one spare drive on standby that can be swapped in immediately if any drives fail.
However, For the super-paranoid, there is a more rare raid level of 5+1. It could survive multiple drive failures, however it is extremely redundant and not very efficient.
RAID 5+1 array capacity: (size of smallest drive) * ( (number of drives / 2) - 1). So an array with ten 120 GB drives (1.2TB total) would have a capacity of 120 GB * ( (10/2) - 1 ) which is equal to only 480 GB.
Paul J. Sep 25, 2008
I was the sysad on 28TB of NetApps RAID that utilized RAID 5, separated into six different RAID groups, with multiple bricks per group. NetApps has redundancy throughout its entire setup, with dual heads, dual data channels, etc. The way that the RAID is setup is that in a RAID group, each brick has a parity drive and hot spare set aside. If there are five bricks, then there are 5 hot spares that can be utilized by any of the other bricks within the group, should a drive fail. You could lose up to 5 drives on any brick before you would start to lose data. It would be very hard to lose any data with this setup as long as you are checking your drives on a daily basis. And no, I don't work for NetApps, I just like how their built-in redundancy make for a very flawless RAID setup.
Scott Klarr Apr 25, 2009
... If only I were rich!
Adarsh Kumar Nov 07, 2008
Very well explained. I enjoyed it.
james Dec 11, 2008
I studied logic gates at college and computer science at university and I still find the mathematics behind how gates work very interesting, although an amateur profession for me as I haven't got the mental pace the industry requires this is a great page which explains the simplicity of data storage in a RAID-5 structured array.
Manzar Dec 13, 2008
Nice & well Documented for others to understand the RAID concpt in easy way ,once again Nice one
Kendall Dec 31, 2008
Raid is designed for disk failure not data corruption.
Cal Jan 30, 2009
I still don't get it coz i'm a novice. I jumped into this page coz iwant to build a server for my little online store...so if i want to build a server for my little online store. should i go for raid 1 or raid 5?
Wayne Martin Apr 10, 2009
OUTSTANDING!!!!!! This is the best explanation I have ever seen on raids! If you are not, you should be an educator.
Abraham Apr 18, 2009
I have a question regarding writing big blocks in RAID 5, I understand that if write small blocks in Raid 5, there will be 4 operations: Read data disk, Read parity disk, compare and calculate, and after write data disk and write parity disk. But what does Raid 5 when it writes Big blocks? does it need READ also? what are the steps to write big blocks?
Thanks.
Scott Klarr Apr 25, 2009
I am honestly not sure about the specific methods used beyond the basic principle I explained. I think you would have to ask someone who has written lower-level RAID software.
James Apr 23, 2009
This is the best explanation of RAID 5 that I have ever read. Everything is clear and very easy to understand. THANK YOU!
John Apr 27, 2009
If i have 3 hard drive setup in a RAID-5 and 1 hard drive fails. Will my data be lost? I know RAID-5 needs a min of 3 Hard drives, but i'm just wondering what would happen if 1 hard drive fail (without a online spare).
Scott Klarr Apr 27, 2009
If you have 3 drives and one fails, you will not lose any data. However you would need to replace the failed drive ASAP because if a second drive fails before you replace the first failed drive, you will lose everything.
Terry May 02, 2009
I have a RAID 5 with three drives, one of the disk fail, I just got the replacement drive I plug it in but the server is saying array battery is disabled and it needs to be charge upn error 1794, it went through the post but I got the this error "error loading operating system" should I be alarm or its because the battery is not charge on the array and therefore no rebuilding of the drive is possible, or the OS should have load and there rebuild done later. Its a ML530 server. Awating a speedy reply am at work now trying to sort it out.
Scott Klarr May 02, 2009
Does the server have any kind of RAID-Array management panel that you can load into during boot?
Scott Klarr May 02, 2009
You might want to try calling the company that manufactured your server if you cannot find any information about how to rebuild the array in the user manual or online.
Terry May 02, 2009
Hi there, I got it up and everything but my F: AND G: drive that had the data are not there just the unallocated space. what can i do.
Terry May 02, 2009
Help, Help
Terry May 03, 2009
The server is up can say why the other partitions are not there.
At first the server would not load the os with just two drives, now that is happening, cos the drive that i put in there has gone bad also, so am back to square one needing a replaement. But wat has happened to the other partitions, just need a explanation please.
SB xx May 19, 2009
The best RAID 5 explanation on internet.
Baso May 26, 2009
Very nice site, what happened Mr. Terry after that were you able to recover?
best to learn from others mistakes :$
Thomas Jun 01, 2009
Excellent explanation ever on RAID 5
Dima Aug 22, 2009
At last I found TECHNICAL information about RAID 5, thanx!
Anto Sep 26, 2009
Very good doc., well explained.
Thanx
Irshad Oct 02, 2009
Hi,
Plese conform whethever RAId is Software or Hardware. Pleas mention the Different between S/W RAID and H/W RAID.
Thanks and regards,
Irshad M
Scott Klarr Oct 04, 2009
RAID can be found in both software-based and hardware-based solutions. Software-based is much cheaper and relies on the CPU to do the bit comparisons; most onboard motherboard RAID controllers are actually software based. Hardware based solutions are typically much more expensive, but they give you better performance.
bassy Nov 23, 2009
This is the best explanation of RAID 5 that I have ever read. Everything is clear and very easy to understand. THANK YOU!
Pointy Haired Boss Nov 27, 2009
thank you very much for the explanation. It's CLEAR!
De Wet Dec 02, 2009
Thank you!!! Earlier today I read how much space a raid 5 array gives and I just could not get my mind around it, where is the redundant data stored? It was driving me crazy. Very clear and well written post.
MrLonandB Dec 03, 2009
Very good explanation. You've taken the simple that has been made complicated -- simple again. Thanks.
David Brownell Dec 06, 2009
Thanks for this clear explanation Scott. I had a RAID 0 setup and one of the 2 identical drives failed (WD Raptor 150GB). I lost everything. I then set up a RAID 5 with the other WD 150GB Raptor, a 160GB and a 500GB drive. Within five days the other WD Raptor also failed! I replaced it with a spare 250GB drive. So now it is all working OK as a 320GB array. Western Digital will replace both drives, BUT, as I understand it, I will not be able to swap the old 160GB and 250GB drives with the new WD 150GB drives, because they are SMALLER. What can I do?
Scott Klarr Jan 21, 2010
I'm not sure about this. My guess is that you are right about not being able to put the smaller drives back in. You'd either have to match the lowest sized active drive, or you could always back up all the data and start the RAID array from scratch using the new 150gb drives.
mhz Jan 04, 2010
Hi Scott,
I have 6x320 GB HDD configured with RAID5
Now one of the HDD is failing and I don't have any budget allocation to get a replacement HDD.
Can I just disable the 1xfaulty HDD and run it with 5 HDD now?
OR do I need to recreate my RAID5 setup and start over again from scratch?
Thanks :)
Scott Klarr Jan 21, 2010
You'd have to recreate the RAID configuration from scratch (all data will be lost).
shivaji vaghmode Jan 15, 2010
Please send me all RAID leval details
AMAL Jan 17, 2010
it is very easy to understand
thank you
Mohammed Jan 19, 2010
I need to know what will happen if the parity itself is lost ? is it possible to make calculation.
Scott Klarr Jan 21, 2010
If the parity is lost, it can be recalculated from the corresponding stripes.
My Name Jan 19, 2010
For those who are still confused about RAID configs:
http://videos.howstuffworks.com/labratstv/837-episode-8-raid-explained-video.htm
guntarsr Feb 04, 2010
oh, very good post. I liked it
jhon Feb 17, 2010
I want to know that I am using RAID 5 and having 10 hard disk. If any of the 3 Disk failed then can I still able to recover the data??
Krishna Feb 19, 2010
I have question why only XOR algorithm for RAID 5 parity calculations.. why not other algorithm?
Can any body have an idea..??
Scott Klarr Feb 19, 2010
XOR bit comparison is the most efficient way of creating a parity for multiple bits of data. No matter how many drives you line up, it is only going to require one extra matching drive.
Krishna Feb 19, 2010
John,
Raid 5 wont supports more than one disk failure. So, you cannot recover data if you loose more than one disk in a array of 10 disks.
By increasing the disks, you can utilise the volume utilization more capacity size and the read performance will be good.
But any instance Raid5 supports only single disk failure.
Cameron Sice Feb 19, 2010
that was a very informative answer, you should look at a career :)