Hi Mike, On Tue, 2022-08-09 at 22:41 -0500, Mike Christie wrote: > In several places like LU setup and pr_ops we will hit the noretry > code > path because we do not retry any passthrough commands that hit device > errors even though scsi-ml thinks the command is retryable. > > This has us only fast fail commands that hit device errors that have > been > submitted through the block layer directly and not via scsi_execute. > This > allows SG IO and other users to continue to get fast failures and all > device errors returned to them, and scsi_execute users will get their > retries they had requested. > > Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> Thanks a lot. I like the general approach, but I am concerned that by treating every command sent by scsi_execute() or scsi_execute_req() as a retryable command, we may break some callers, or at least modify their semantics in unexpected ways. For example, spi_execute(), scsi_probe_lun(), scsi_report_lun_scan() currently retry only on UA. With this change, these commands will be retried in additional cases, without the caller noticing (see 949bf797595fc ("[SCSI] fix command retries in spi_transport class"). It isn't obvious to me that this is correct in all affected cases. Note that the retry logic of the mid level may change depending on the installed device handler. For example, ALUA devices will endlessly retry UA with ASC=29, which some callers explicitly look for. I believe we need to review all callers that have their own retry loop and /or error handling logic. This includes sd_spinup_disk(), sd_sync_cache(), scsi_test_unit_ready(). SCSI device handlers treat some sense keys in special ways, or may retry commands on another member of a port group (see alua_rtpg()). DID_TIME_OUT is a general concern - no current caller of scsi_execute() expects timed-out commands to be retried, and doing so has the potential of slowing down operations a lot. I am aware that my recent patch changed exactly this for scsi_probe_lun(), but doing the same thing for every scsi_execute() invocation sounds at least somewhat dangerous. Bottom line: I wonder if SUBMITTED_BY_SCSI_EXEC is suffient to determine that a command should be retried like an ordinary block level command. Regards, Martin > --- > drivers/scsi/scsi_error.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index ac4471e33a9c..573d926220c4 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -1806,7 +1806,8 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd) > * assume caller has checked sense and determined > * the check condition was retryable. > */ > - if (req->cmd_flags & REQ_FAILFAST_DEV || > blk_rq_is_passthrough(req)) > + if (req->cmd_flags & REQ_FAILFAST_DEV || > + scmd->submitter == SUBMITTED_BY_BLOCK_PT) > return true; > > return false;