Re: A few questions regarding RAID5/RAID6 recovery

David Brown <david@xxxxxxxxxxxxxxx> · Tue, 26 Apr 2011 09:21:53 +0200

On 25/04/2011 19:47, KÅvÃri PÃter wrote:
Hi all,

Since this is my first post here, let me first thank all developers
for their great tool. It really is a wonderfull piece of soft. ;)

I heard a lot of horror stories about the event, when a member of a
raid5/6 array gets kicked off due to I/O errors, and then, after the
replacement and during the recostruction, another drive fails, and
the array become unusable. (For raid6, add another drive to the
story, and the problem is the same, so letâs just talk about raid5
now). I want to prepare myself for this kind of unlucky event, and
build up a strategy that I can follow once it happens. (I hope never,
but...)

Letâs assume we have a 4 drives RAID5, that has been degraded, the
failed drive has been replaced, then the rebuild process failed, and
now we have an array with 2 good disks, one failed disk and one which
is partially synchronized (the new one). And, we also have the disk
out of the array, which was originally failed. If I assume, that both
of the failed disks have some bad sectors but otherwise both are in
an operative condition (can be dd-ed for example), then, except the
unlikely event, when both disks have failed on the very same physical
sector (chunk?), then theoretically the data is there and could be
retrieved. So my question is, can we retrieve them by using mdadm and
some âtricksâ? I think of something like this:

1. I assemble (or --create --assume-clean) the array in degraded mode
using the 2 good drives, and one of the 2 failed drives which has
it's bad sectors behind the point than the other failed drive. 2. Add
the new drive, let the array start rebuilding, and wait for the
process go beyond the point where the other failed drive has it's bad
sectors. 3. Stop/pause/??? the rebuild process. And - if possible -
make a note of the exact sector (chunk) where the rebuild has been
paused. 4. Assemble (or --create --assume-clean) the array again, but
this time using the other failed drive, 5. Add the new drive again,
and continue to rebuild from the point where the last rebuild has
been paused. Since we are over the point where the failed disk has
it's bad sectors, the rebuild should finish fine. 6. Finally remove
the failed disk and replace it with another new drive.

Can this be done using mdadm somehow?

My next question is not really a question but rather a wish. In my
point of view, the above written situation is by far the biggest
weekness of not just linux software raid but all other harware raid
solutions that i know of (don't know many, though). Even nowadays,
when we use larger and largers disks. So i'm wondering if there is
any raid or raid-kind solution that - along with redundancy, -
provides some automatic stipe (chunk) reallocation feature? Something
like modern hard disks do with their "reallocated sectors", something
like: the raid driver reserves some chunks/stripes for
"reallocation", and once an I/O error happens on any of the
active/working chunks, then instead of kicking the disk off, it marks
the stripe/chunk bad, and moves the data to one of the reserved ones,
and continues (along with some warning of course). Only, if writing
to the reserved chunk fails, would be necessary to immediately kick
the member off.

The other thing I wonder is why raid solutions (that i know of) use
the "first remove the failed, then add the new" strategy instead of
"add the new, I try to recover, then remove the failed" strategy.
They use the former even when a spare drive is available, because -as
far as i know - they won't utilize the failed disk for rebuild. Why?
By using the latter strategy, it would be a joy to recover from
situations like above.

Thanks for your response.

Best regards, Peter

You are not alone in these concerns.  A couple of months ago there was a 
long thread here about a roadmap for md raid.  The first two entries are 
a "bad block log" to allow reading of good blocks from a failing disk, 
and "hot replace" to sync a replacement disk before removing the failing 
one.  Being on a roadmap doesn't mean that these features will make it 
to md raid in the near future - but it does mean that there are already 
rough plans to solve these problems.

<http://neil.brown.name/blog/20110216044002>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html