On Thursday September 16, linux@xxxxxxxxxxxxxxxx wrote: > I have some experimental code that does the read-recovery piece for > raid1 devices against kernel 2.4.26. If an error is encountered on a > read, the failure is delayed until the read is retried to the other > mirror. If the retried read succeeds it then writes the recovered block > back over the previously failed block. > If the write fails then the drive is marked faulty otherwise we > continue without setting the drive faulty. ( The idea here is that > modern disk drives have spare sectors, and will be automatically > reallocate a bad sector to one of the spares on the next write ). > The caveat is that if the drive is generating lots of bad/failed > reads it's most likely going south.. but that's what smart log > monitoring is for. If anyone is interested I can post the patch. Certainly interested. Do you have any interlocking to ensure that if a real WRITE is submitted immediately after (or even during !!!) the READ, it does not get destroyed by the over-write. e.g. application drive0 drive1 READ request READ from drive 0 fails READ from drive 1 success. Schedule over-write on drive0 READ completes WRITE block WRITE to drive0 WRITE to drive1 overwrite happens. It is conceivable that the WRITE could be sent even *before* the READ completes though I'm not sure if it is possible in practice. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html