Re: Write and verify correct data to read-failed sectors before degrading array?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Neil Brown wrote:

On Thursday September 16, linux@xxxxxxxxxxxxxxxx wrote:


I have some experimental code that does the read-recovery piece for raid1 devices against kernel 2.4.26. If an error is encountered on a read, the failure is delayed until the read is retried to the other mirror. If the retried read succeeds it then writes the recovered block back over the previously failed block. If the write fails then the drive is marked faulty otherwise we continue without setting the drive faulty. ( The idea here is that modern disk drives have spare sectors, and will be automatically reallocate a bad sector to one of the spares on the next write ). The caveat is that if the drive is generating lots of bad/failed reads it's most likely going south.. but that's what smart log monitoring is for. If anyone is interested I can post the patch.



Certainly interested.

Do you have any interlocking to ensure that if a real WRITE is
submitted immediately after (or even during !!!) the READ, it does not
get destroyed by the over-write.
e.g.

application     drive0          drive1
READ request
               READ from drive 0
		fails
				READ from drive 1
				success. Schedule over-write on drive0
READ completes
WRITE block
		WRITE to drive0 WRITE to drive1

               overwrite happens.


It is conceivable that the WRITE could be sent even *before* the READ completes though I'm not sure if it is possible in practice.

NeilBrown



No, there is no interlocking at this time. I solve the above problem by not replying to the read until after the recovery write attempt either fails or completes. This works great when the application above us ( like a FS ) is using the buffer cache or guarantees no R-W conflicts. ( I believe this is the case with buffered block devices at this time ). Using /dev/raw and an application that can cause R-W conflicts WILL result in corruption. This is why the patch is experimental. :)

I've tested the code on a fault injector and I have not been able to cause a corruption using ext3 or xfs.

-Sebastian

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux