On 9/15/23 4:34 PM, Martin Wilck wrote: > On Fri, 2023-09-15 at 22:21 +0200, Martin Wilck wrote: >> On Tue, 2023-09-05 at 18:15 -0500, Mike Christie wrote: >>> This has read_capacity_16 have scsi-ml retry errors instead of >>> driving >>> them itself. >>> >>> There are 2 behavior changes with this patch: >>> 1. There is one behavior change where we no longer retry when >>> scsi_execute_cmd returns < 0, but we should be ok. We don't need to >>> retry >>> for failures like the queue being removed, and for the case where >>> there >>> are no tags/reqs since the block layer waits/retries for us. For >>> possible >>> memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so >>> retrying will probably not help. >>> 2. For the specific UAs we checked for and retried, we would get >>> READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were >>> left >>> from the loop's retries. Each UA now gets >>> READ_CAPACITY_RETRIES_ON_RESET >>> reties, and the other errors (not including medium not present) get >>> up >>> to 3 retries. >> >> This is ok, but - just a thought - would it make sense to add a field >> for maximum total retry count (summed over all failures)? That would >> allow us to minimize behavior changes also in other cases. > > Could we perhaps use scmd->allowed for this purpose? > > I noted that several callers of scsi_execute_cmd() in your patch set > set the allowed parameter, e.g. to sdkp->max_retries in 07/34. > But allowed doesn't seem to be used at all in the passthrough case, I think scmd->allowed is still used for errors that don't hit the check_type goto in scsi_no_retry_cmd. If the user sets up scsi failures for only UAs, and we got a DID_TRANSPORT_DISRUPTED then we we do scsi_cmd_retry_allowed which checks scmd->allowed. In scsi_noretry_cmd we then hit the: if (!scsi_status_is_check_condition(scmd->result)) return false; and will retry. > so we might as well use it as an upper limit for the total number of > retries. > If you really want an overall retry count let me know. I can just add a total limit to the scsi_failures struct. You have been the only person to ask about it and it didn't sound like you were sure if you wanted it or not, so I haven't done it.