On 1/26/21 7:02 AM, Hannes Reinecke wrote:
When a command is return with DID_TRANSPORT_DISRUPTED we should be looking at the REQ_FAILFAST_TRANSPORT flag and do not retry the command if set. Otherwise multipath will be requeuing a command on the failed path and not fail it over to one of the working paths. Cc: Martin Wilck <martin.wilck@xxxxxxxx> Signed-off-by: Hannes Reinecke <hare@xxxxxxxx> --- drivers/scsi/scsi_error.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index a52665eaf288..005118385b70 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1753,6 +1753,7 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd) case DID_TIME_OUT: goto check_type; case DID_BUS_BUSY: + case DID_TRANSPORT_DISRUPTED: return (scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT); case DID_PARITY: return (scmd->request->cmd_flags & REQ_FAILFAST_DEV);
We don't fast fail for that error code to avoid churn for transient transport problems. The FC and iscsi drivers block the rport/session, return that code and then it's up the fast_io_fail/replacement timeout.