Re: RAID 5 - One drive dropped while replacing another

David Brown <david.brown@xxxxxxxxxxxx> · Wed, 02 Feb 2011 22:15:31 +0100

On 02/02/11 17:29, hansbkk@xxxxxxxxx wrote:
On Wed, Feb 2, 2011 at 11:03 PM, Scott E. Armitage
<launchpad@xxxxxxxxxxxxxxxxxxx>  wrote:
RAID1+0 can lose up to half the drives in the array, as long as no single
mirror loses all it's drives. Instead of only being able to survive "the
right pair", it's quite the opposite: RAID1+0 will only fail if "the wrong
pair" of drives fail.

AFAICT it''s a glass half-full/half-empty thing. Maybe it's just my
personality, but I don't like leaving such things to chance. Maybe if
I had more than two drives per array, but that would be **very**
inefficient (ie expensive usable space ratio).

However, following up on the "spare-group" idea, I'd like confirmation
please that this scenario would work:

 From the man page:

mdadm may move a spare drive from one array to another if they are in
the same spare-group and if the destination array has a failed drive
but no spares.

Given all component drives are the same size, mdadm.conf contains

ARRAY 	/dev/md0 level=raid1 num-devices=2 spare-group=bigraid10
ARRAY	/dev/md1 level=raid1 num-device=2 spare-group=bigraid10	
etc

I then add any number of spares to any of the RAID1 arrays (which
under RAID 1+0 would be in turn components of the RAID0 span one layer
up - personally I'd use LVM for this) the follow/monitor mode feature
would allocate these spares as whatever RAID1 array needed them.

Does this make sense?

If so I would recognize this as being more fault-tolerant than RAID6,
with the big advantage being fast rebuild times - performance
advantages too, especially on writes - but obviously at a relatively
higher cost.

You have to be precise about what you mean by fault-tolerant.  With 
RAID6, /any/ two drives can fail and your system is still running.  Hot 
spares don't change that - they just minimise the time before one of the 
failed drives is replaced.

If you have a set of RAID1 pairs that are striped together (by LVM or 
RAID0), then you can only tolerate a single failed drive.  You /might/ 
tolerate more failures.  For example, if you have 4 pairs, then a random 
second failure has a 7/8 chance of being on a different pair, and 
therefore safe.  If you crunch the numbers, it's possible that the 
average or expected number of failures you can tolerate is more than 2. 
 But for the guaranteed worst-case scenario, your set can only tolerate 
a single drive failure.  Again, hot spares don't change that - they only 
reduce your degraded (and therefore risky) time.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html