Re: high throughput storage server?

David Brown <david@xxxxxxxxxxxxxxx> · Thu, 24 Feb 2011 12:24:54 +0100

On 23/02/2011 22:11, Stan Hoeppner wrote:
David Brown put forth on 2/23/2011 7:56 AM:

However, as disks get bigger, the chance of errors on any given disk is
increasing.  And the fact remains that if you have a failure on a RAID10
system, you then have a single point of failure during the rebuild
period - while with RAID6 you still have redundancy (obviously RAID5 is
far worse here).

The problem isn't a 2nd whole drive failure during the rebuild, but a
URE during rebuild:

http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

Yes, I've read that article - it's one of the reasons for always 
preferring RAID6 to RAID5.

My understanding of RAID controllers (software or hardware) is that they 
consider a drive to be either "good" or "bad".  So if you get an URE, 
the controller considers the drive "bad" and ejects it from the array. 
It doesn't matter if it is an URE or a total disk death.

Maybe hardware RAID controllers do something else here - you know far 
more about them than I do.

The idea of the md raid "bad block list" is that there is a medium 
ground - you can have disks that are "mostly good".

Supposing you have a RAID6 array, and one disk has died completely.  It 
gets replaced by a hot spare, and rebuild begins.  As the rebuild 
progresses, disk 1 gets an URE.  Traditional handling would mean disk 1 
is ejected, and now you have a double-degraded RAID6 to rebuilt.  When 
you later get an URE on disk 2, you have lost data for that stripe - and 
the whole raid is gone.

But with bad block lists, the URE on disk 1 leads to a bad block entry 
on disk 1, and the rebuild continues.  When you later get an URE on disk 
2, it's no problem - you use data from disk 1 and the other disks. 
URE's are no longer a killer unless your set has no redundancy.

URE's are also what I worry about with RAID1 (including RAID10) 
rebuilds.  If a disk has failed, you are right in saying that the 
chances of the second disk in the pair failing completely are tiny.  But 
the chances of getting an URE on the second disk during the rebuild are 
not negligible - they are small, but growing with each new jump in disk 
size.

With md raid's future bad block lists and hot replace features, then an 
URE on the second disk during rebuilds is only a problem if the first 
disk has died completely - if it only had a small problem, then the "hot 
replace" rebuild will be able to use both disks to find the data.

I don't know if you've followed the recent "md road-map: 2011" thread (I
can't see any replies from you in the thread), but that is my reference
point here.

Actually I haven't.  Is Neil's motivation with this RAID5/6 "mirror
rebuild" to avoid the URE problem?

I know you are more interested in hardware raid than software raid, but 
I'm sure you'll find some interesting points in Neil's writings.  If you 
don't want to read through the thread, at least read his blog post.

<http://neil.brown.name/blog/20110216044002>

Incidentally, what's your opinion on a RAID1+5 or RAID1+6 setup, where
you have a RAID5 or RAID6 build from RAID1 pairs?  You get all the
rebuild benefits of RAID1 or RAID10, such as simple and fast direct
copies for rebuilds, and little performance degradation.  But you also
get multiple failure redundancy from the RAID5 or RAID6.  It could be
that it is excessive - that the extra redundancy is not worth the
performance cost (you still have poor small write performance).

I don't care for and don't use parity RAID levels.  Simple mirroring and
RAID10 have served me well for a very long time.  They have many
advantages over parity RAID and few, if any, disadvantages.  I've
mentioned all of these in previous posts.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html