RE: RAID 5 - One drive dropped while replacing another

"Leslie Rhorer" <lrhorer@xxxxxxxxxxx> · Wed, 2 Feb 2011 11:25:57 -0600



> -----Original Message-----
> From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
> owner@xxxxxxxxxxxxxxx] On Behalf Of hansbkk@xxxxxxxxx
> Sent: Wednesday, February 02, 2011 9:29 AM
> To: Roman Mamedov; Robin Hill
> Cc: Bryan Wintermute; linux-raid@xxxxxxxxxxxxxxx
> Subject: Re: RAID 5 - One drive dropped while replacing another
> 
> On Wed, Feb 2, 2011 at 9:28 PM, Roman Mamedov <rm@xxxxxxxxxx> wrote:
> 
> > Exactly, RAID6 would make an order of magnitude more sense.
> > A 15-drive RAID5 array is just one step (one drive failure) from
> becoming a
> > 14-drive RAID0 array (reliability-wise).
> 
> > Would you also ask "what's wrong with having a 14-drive RAID0"?
> 
> Thanks Roman, I just wanted to check that's what you meant.
> 
> 
> On Wed, Feb 2, 2011 at 9:47 PM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
> 
> >> Or is there an upper limit as to the number of drives that's advisable
> >> for any array?
> >>
> > I'm sure there's advice out there on this one - probably a recommended
> > minimum percentage of capacity used for redundancy.  I've not looked
> > though - I tend to go with gut feeling & err on the side of caution.
> >
> >> If so, then what do people reckon a reasonable limit should be for a
> >> RAID6 made up of 2TB drives?

	That depends on many factors.  The bottom line question is, "how
safe does the live system need to be?"  If taking the system down to recover
from backups is not an unreasonable liability, then there is no hard limit
to the number of drives.  For that matter, if being down for a long period
is not considered an unacceptable hardship, or if one is running high
availability mirrored systems, then a 20 disk RAID0 might be reasonable.

> > As the drive capacities go up, you need to be thinking more carefully
> > about redundancy - with a 2TB drive, your rebuild time is probably over
> > a day.  Rebuild also tends to put more load on drives than normal, so is
> > more likely to cause a secondary (or even tertiary) failure.  I'd be
> > looking at RAID6 regardless, and throwing in a hot spare if there's more
> > than 5 data drives.  If there's more than 10 then I'd be going with
> > multiple arrays.
> 
> Thanks for the detailed reply Robin. I'm also sure there's advice "out
> there", but I figure there's no more authoritative place to explore
> this topic than here; I hope people don't mind the tangent.
> 
> So keeping the drive size fixed at 2TB for the sake of argument, do
> people agree with the following as a conservative rule of thumb?
> Obviously adjustable depending on financial resources available and
> the importance of keeping the data online, given the fact that
> restoring this much data from backups would take a loooong time. This
> example is for a money-poor environment that could live with a day or
> two of downtime if necessary.
> 
> less than 6 drives => RAID5
> 6-8 drives ==> RAID6
> 9-12 drives ==> RAID6+spare
> over 12 drives, start spanning multiple arrays (I use LVM in any case)

	That's pretty conservative, yes, for middle of the road
availability.  For a system whose necessary availability is not too high, it
is considerable overkill.  For a system whose availability is critical, it's
not conservative enough.

> On Wed, Feb 2, 2011 at 9:29 PM, Mathias Burén <mathias.buren@xxxxxxxxx>
> wrote:
> > With 15 drives, where only 1 can fail (RAID5) without data loss.. it's
> > a quite high risk that 2 (or more) drives will fail within a short
> > period of time. If you have less drives, this chance decreases. For
> > large amount of drives I recommend RAID10 personally (or RAID1+0,
> > whichever you prefer).
> >
> > RAID6 + 1 hot spare is also nice, and cheaper. (for ~10 drives)
> 
> Mathias, RAID1+0 (not talking about "mdm RAID10" here) would only
> protect my data if the "right pair" of drives failed at the same time,

	That assumes the RAID1 array elements only have 2 members.  With 3
members, the reliability goes way up.  Of course, so does the cost.

> depending on luck, whereas RAID6 would allow **any** two (and
> RAID6+spare any *three*) drives to fail without my losing data. So I

	That's specious.  RAID6 + spare only allows two overlapping
failures.  If the failures don't overlap, the even RAID5 without a spare can
tolerate an unlimited number of failures.  All the hot spare does is allow
for immediate instigation of the rebuild, reducing the probability of a
drive failure during the period of degradation.  It doesn't increase the
resiliency of the array.

> thought I'd always prefer RAID6. Or were you perhaps thinking of
> something fancy using "spare pools" or whatever they're called to
> allow for multiple spares to fill in for any failures at the
> underlying RAID1 layer? Now that I think about it, that seems like a
> good idea if it could be made to work, as the simple mirrors do repair
> much faster. But of course the greatly reduced usable disk space ratio
> make this pretty expensive. . .

	There are lots of strategies for increasing resiliency.  The
compromise is always a three-way competition between cost, speed, and
availability.

> > with a 2TB drive, your rebuild time is probably over a day.
> 
> On my lower-end systems, a RAID6 over 2TB drives takes about 10-11
> hours per failed disk to rebuild, and that's using embedded bitmaps
> and with nothing else going on.

	I've never had one rebuild from a bare drive that fast.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html