Re: entire array lost when some blocks unreadable?

Mike Hardy <mhardy@xxxxxxx> · Tue, 07 Jun 2005 14:21:30 -0700

Brad Campbell wrote:

> Join the long line next to the club trophy cabinet :)

Its a shame the line is this long - I wish I had the time to implement
the solution myself, but not having that I can't really whine either.
Its still a shame though. Alas.

> Something along those lines.
> Generally if I get an error notification from smartd I pull the drive
> from the array and re-add it. This causes a rewrite of the entire disk
> and everyone is happy. (Unless the drive is dying, in which case the
> rewrite of the entire disk usually finishes it off nicely)

When I get one of those, the first thing I do is verify my backup :-).
The backup is a second array that's on the network, so I typically
remount it read-only at that point.

Then I start drive scans on all drives (primary and backup) to see if
I've got any other blocks that will stop reconstruction. If I find any
other bad blocks on other devices, I immediately remount the primary as
read-only to preserve the data (if its not already gone) on all of the
disks. Note my disks almost never get written to, so this actually does
preserve the old data everywhere in all the cases I care about.

After that, a fail and re-add has done the trick for me in the past, but
once I actually got remapped into a bad block. Very annoying. Since
then, I fail the disk and do multiple badblocks passes on it.

Being able to enable an "aggressively correct" raid mode where any
single-block read error triggered a reconstruct/write/re-read cycle
until either it worked or failed would be nice. Bonus points for extra
md status markers that mdadm could pick up and mail to folks depending
on policy configuration.

-Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html