Hi Alexander, On 12/18/2017 10:51 AM, Alexander Shenkin wrote: > Hi all, > > I'm getting back to this now that I'll have time, apologies for the > delay. So, is the following correct in the case of a read error? Not quite. > 1) System tries to read an unreadable sector > 2) Drive timeout reports unreadable based on drive timeout setting. > 2a) In this case, mdadm sees the sector is unreadable and rewrites it > elsewhere on that drive. No. MD reconstructs the sector from redundancy (mirror or reverse parity calc or reverse P+Q syndrome) and writes it back to the *same* sector. Since the drive firmware reported an error here, it knows to verify the write as well. If the verification fails, the drive firmware will relocate the sector in the background, invisible to the upper layers. As far as MD is concerned, that sector address is fixed either way. Relocations are handled entirely within the drive. MD does not perform or track relocations. > 3) If linux hangcheck timer runs out before the drive timeout, then > linux aborts the read, logs an error, and mdadm isn't given a chance > to rewrite elsewhere based on checksums. No. The hangcheck timer issue described in your forwarded email is unrelated. And MD doesn't use checksums. Each drive has a device driver timeout, as you note below, found at /sys/block/*/device/timeout, that linux's ATA/SCSI stack uses to cut off non-responsive controller cards and/or drives. If that timer runs out on a read before the drive reports the read error, the low level *driver* reports a read error to the MD layer. MD treats it the same as any other read error, locating or recomputing the sector from redundancy as above. The difference in this case is that the physical drive isn't talking to the controller (link reset in progress, typically) and the corrective rewrite of the sector (to fix or relocate within the drive) is refused, and that write error causes MD to kick out the drive. And the pending sector is also left unfixed. > Given all this, it seems to me that I should now set the hangcheck > timer to something greater than drive timeout (180 seconds). Does > that sound right? Otherwise, linux will kill the rewrite again, no? In and of itself, waiting on I/O is not a hang. So it should not be applicable. Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html