error handling - DMA to PIO step down sequence

Ric Wheeler <ric@xxxxxxx> · Wed, 20 Sep 2006 15:03:27 -0400

Now that Tejun has put in the enhanced error handling (which is a big 
jump forward), I have been trying to test and validate the code and the 
assumptions.

Having spent far too much time on planes recently, broken only by 
spending the other part of my time helping do root cause failure 
analysis of drives, I have been questioning the validity of the way we 
currently derate our p-ata and s-ata connected drives from DMA to slower 
DMA to PIO and then spiral on down.

All of this is a long winded way of asking if this step down is ever 
valid for either S-ATA (or even modern P-ATA) drives.

From what I see and what I hear from the way my colleagues handle drive 
errors in non-linux code, this seems to be very aggressive and most 
likely not justified with modern drives and hba's.

Derating should probably never happen on normal drive errors - even 
those that might take 10's of seconds.  Often, drives will try really, 
really hard to recover and might eventually respond after internally 
giving up after up to 30 seconds.

Also, NACK's from unsupported commands or any type of media errors 
should not kick off this sequence.

Would this be a reasonable thing for a config option? Better to add yet 
another blacklist for devices that might have a justified need for this 
derating?

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html