On Thu, 4 Dec 2008, Mike Anderson wrote: > Previously I had submitted some patches on scsi mid retry with a short text > on current retry policy (this cover mid retry policy vs > scsi_io_completion, which should be unified). > > http://marc.info/?l=linux-scsi&m=122210133628085&w=2 > > I will try to refresh my patches with a updated policy document and also > align that with the changes to scsi_io_completion posted prior to > re-submit. Your policy discussion needs to be expanded. And it needs to apply to scsi_io_completion() as well as scsi_decide_disposition(). As I see it, the set of possible retry actions is as follows: 1: Don't retry at all. This is appropriate for certain kinds of errors (such as LBA out of range); you know that they will never succeed no matter how many times you try them. 2: Keep on retrying until the request times out. This is appropriate in only a few circumstances (like the tape arrays James mentioned earlier). 3: Retry a few times, generally with a short delay between attempts, and then give up. I favor a total of 3 attempts but the current code tends to use 6 -- okay, fine. What's needed is a clear classification of errors into these three cases; that's what your policy document tries to do. However the implementation of case (3) in particular needs to be fixed, since the code does not limit the number of retries correctly. By the way, there seems to be some confusion over how to handle HARDWARE ERROR (SK = 4). The spec says "nonrecoverable". This does not mean non-retryable! "Nonrecoverable" means that the hardware was unable to recover from the error. But it still might be a transient error, and it might go away if the command was tried again. In fact, the spec specifically mentions "parity error" as a possible cause; certainly a parity error might go away the next time the command is issued. So the whole idea of the retry_hwerr flag is bogus; hardware errors should always be retried. Or perhaps only the name is bogus, since the flag really indicates that the command should be tried over and over again without pause until it succeeds or the request times out (whereas normally hardware errors should be retried only a few times). Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html