Re: Write and verify correct data to read-failed sectors before degrading array?

Neil Brown <neilb@xxxxxxxxxxxxxxx> · Fri, 17 Sep 2004 12:00:33 +1000

On Thursday September 16, linux@xxxxxxxxxxxxxxxx wrote:
>     I have some experimental code that does the read-recovery piece for 
> raid1 devices against kernel 2.4.26.  If an error is encountered on a 
> read, the failure is delayed until the read is retried to the other 
> mirror.  If the retried read succeeds it then writes the recovered block 
> back over the previously failed block. 
>     If the write fails then the drive is marked faulty otherwise we 
> continue without setting the drive faulty.  ( The idea here is that 
> modern disk drives have spare sectors, and will be automatically 
> reallocate a bad sector to one of the spares on the next write ). 
>     The caveat is that if the drive is generating lots of bad/failed 
> reads it's most likely going south.. but that's what smart log 
> monitoring is for.  If anyone is interested I can post the patch.

Certainly interested.

Do you have any interlocking to ensure that if a real WRITE is
submitted immediately after (or even during !!!) the READ, it does not
get destroyed by the over-write.
e.g.

application     drive0          drive1
READ request
                READ from drive 0
		fails
				READ from drive 1
				success. Schedule over-write on drive0
READ completes
WRITE block
		WRITE to drive0 WRITE to drive1

                overwrite happens.

It is conceivable that the WRITE could be sent even *before* the READ
completes though I'm not sure if it is possible in practice.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html