On 1/28/21 1:51 AM, michael.christie@xxxxxxxxxx wrote:
On 1/26/21 7:02 AM, Hannes Reinecke wrote:
When a command is return with DID_TRANSPORT_DISRUPTED we should
be looking at the REQ_FAILFAST_TRANSPORT flag and do not retry
the command if set.
Otherwise multipath will be requeuing a command on the failed
path and not fail it over to one of the working paths.
Cc: Martin Wilck <martin.wilck@xxxxxxxx>
Signed-off-by: Hannes Reinecke <hare@xxxxxxxx>
---
drivers/scsi/scsi_error.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index a52665eaf288..005118385b70 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1753,6 +1753,7 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd)
case DID_TIME_OUT:
goto check_type;
case DID_BUS_BUSY:
+ case DID_TRANSPORT_DISRUPTED:
return (scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT);
case DID_PARITY:
return (scmd->request->cmd_flags & REQ_FAILFAST_DEV);
We don't fast fail for that error code to avoid churn for transient
transport problems. The FC and iscsi drivers block the rport/session,
return that code and then it's up the fast_io_fail/replacement timeout.
_But_ if prevents that command to be failed over to another path, so
essentially we're blocking execution until fast_io_fail tmo.
For no good reason as we have other paths available.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer