Now that Tejun has put in the enhanced error handling (which is a big
jump forward), I have been trying to test and validate the code and the
assumptions.
Having spent far too much time on planes recently, broken only by
spending the other part of my time helping do root cause failure
analysis of drives, I have been questioning the validity of the way we
currently derate our p-ata and s-ata connected drives from DMA to slower
DMA to PIO and then spiral on down.
All of this is a long winded way of asking if this step down is ever
valid for either S-ATA (or even modern P-ATA) drives.
From what I see and what I hear from the way my colleagues handle drive
errors in non-linux code, this seems to be very aggressive and most
likely not justified with modern drives and hba's.
Derating should probably never happen on normal drive errors - even
those that might take 10's of seconds. Often, drives will try really,
really hard to recover and might eventually respond after internally
giving up after up to 30 seconds.
Also, NACK's from unsupported commands or any type of media errors
should not kick off this sequence.
Would this be a reasonable thing for a config option? Better to add yet
another blacklist for devices that might have a justified need for this
derating?
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html