Re: feature suggestion to handle read errors during re-sync of raid5

Goswin von Brederlow <goswin-v-b@xxxxxx> · Sat, 30 Jan 2010 19:59:16 +0100

Mikael Abrahamsson <swmike@xxxxxxxxx> writes:

> So, a couple of times I've been having the problem of something going
> wrong on raid5, drive being kicked, thus has a lower event number,
> re-add, during the sync a single block on one of the other drives has
> a read error (surprisingly common on WD20EADS 2TB drives), resync
> stops, I have to take down the array, ddrescue the whole read error
> drive to another drive, I lose that block, start up the array
> degraded, and then add the drive again.
>
> It would be nice if there was an option that when re-sync:ing a drive
> which earlier belonged to the array, if there is a read error on
> another drive, just use the parity from the drive being added (in my
> case it's highly likely it'll be valid, and if it's not, then I
> haven't lost anything anyway, because the read error block is gone
> anyway).
>
> Does this make sense? It would of course be nice if the md layer could
> see the difference between sata timeouts and UNC errors, because UNC
> really means something is wrong, whereas sata timeouts might be
> transient problem (?).

Ever looked into adding bitmaps? That way it only syncs the parts where
something changed, is done within minutes and unlikely do get another
error.

MfG
        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html