On 2024/08/03 13:42, Mike Christie wrote: > On 7/31/24 3:22 PM, Bart Van Assche wrote: >> >> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >> index da7dac77f8cd..e21becc5bcf9 100644 >> --- a/drivers/scsi/scsi_error.c >> +++ b/drivers/scsi/scsi_error.c >> @@ -1816,6 +1816,12 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd) >> * assume caller has checked sense and determined >> * the check condition was retryable. >> */ >> - if (req->cmd_flags & REQ_FAILFAST_DEV || blk_rq_is_passthrough(req)) >> - return true; >> + if (req->cmd_flags & REQ_FAILFAST_DEV) >> + return true; >> + if (/* submitted by the SCSI core */) >> + return false; >> + if (blk_rq_is_passthrough(req)) >> + return true; > > Do you just want to retry UAs or all internal passthrough command > errors that go through here? I think that we definitely do *not* want to retry all internal commands. E.g. I do not see the point in retrying INQUIRY or VPD page accesses failing on device scan. > > For just the specific UA or NOT_READY/0x04/0x01 case for an example. > Does every scsi core passthrough caller want that retried? If so, then > I can see where you are coming from where you feel it's a bug. I would > agree we would normally want to retry that in general. Maybe others know > about some specific old case though. I think the command Bart had a problem with is START_STOP_UNIT. If that is the only case causing problems, then we should probably improve sd_start_stop_device() to allow retries. But if there are more commands that needs the same, then the patch that Bart is proposing makes sense I think, but it will require an audit of all REQ_OP_DRV_XXX internal users to make sure that they all use that command with -> allowed set to 0. But hopefully, most cases go through scsi_execute_cmnd(), which should simplify this audit. > > However, I'm not sure for MEDIUM_ERROR or ABORTED_COMMAND. > I think MEDIUM_ERROR probably would not come up for the cases we are > talking about though. > > I don't think we want to always retry DID_TIME_OUT though. The funny > thing is that I think Martin just wanted to retry one specific case > for that error. We had to do the scsi_check_passthrough patches so > we could retry just for scsi_probe_lun. -- Damien Le Moal Western Digital Research