Re: raid5, media scans and stripe-wise resync

David Mansfield <md@xxxxxxxxxxxxx> · Mon, 25 Oct 2004 15:47:02 -0400

On Mon, 2004-10-25 at 15:39, Bruce Lowekamp wrote:
> There was a recent conversation on this mailing list about
> transparently recovering from read errors (essentially just rewriting
> the bad stripe and letting the disk handle it), but I think it focused
> on Raid 1.  It would be a natural for Raid 5 or 6, but I haven't seen
> an experimental patch to do that.
> 
> If you just want to monitor, look at http://smartmontools.sourceforge.net
> each of the drives in my array has a montoring config:
> /dev/hda -a -o on -S on -R 194 -s (S/../.././02|L/../../6/07) -m
> lowekamp@xxxxxxxxx
> 

Thanks for the reference.

> two weeks ago I got email that one disk had a bad read on a sector
> during its weekly long scan (an entire surface scan).  I failed that
> drive manually, waited until it resynced on the spare, overwrote the
> entire drive to let the drive clear the sector (and make sure there
> weren't any other problems), then reran the test and set that drive as
> the spare.
> 

Check out the utility 'scu' at the url: 
http://www.bit-net.com/%7Ermiller/scu.html

It will allow you to 'reassign' the block directly by accessing the scsi
commands.  I've tried the rewrite method you used above, and once or
twice had problems.

> I'd still feel safer if it automatically overwrote only the sector
> with the read error, but at least this way I knew that the other 9
> drives had passed a surface scan just before, so I wasn't likely to
> run into a second read failure on rebuild.
> 

Yeah.  After scanning all disks you are reasonably assured.  But should
it happen that there are two defects, you are completely screwed.  No
way around it, I think.

I'd really like a way to resync a single stripe...

David

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html