On Thu, 2021-01-28 at 07:57 +0100, Hannes Reinecke wrote: > On 1/28/21 1:51 AM, michael.christie@xxxxxxxxxx wrote: > > On 1/26/21 7:02 AM, Hannes Reinecke wrote: > > > When a command is return with DID_TRANSPORT_DISRUPTED we should > > > be looking at the REQ_FAILFAST_TRANSPORT flag and do not retry > > > the command if set. > > > Otherwise multipath will be requeuing a command on the failed > > > path and not fail it over to one of the working paths. > > > > > > Cc: Martin Wilck <martin.wilck@xxxxxxxx> > > > Signed-off-by: Hannes Reinecke <hare@xxxxxxxx> > > > --- > > > drivers/scsi/scsi_error.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/drivers/scsi/scsi_error.c > > > b/drivers/scsi/scsi_error.c > > > index a52665eaf288..005118385b70 100644 > > > --- a/drivers/scsi/scsi_error.c > > > +++ b/drivers/scsi/scsi_error.c > > > @@ -1753,6 +1753,7 @@ int scsi_noretry_cmd(struct scsi_cmnd > > > *scmd) > > > case DID_TIME_OUT: > > > goto check_type; > > > case DID_BUS_BUSY: > > > + case DID_TRANSPORT_DISRUPTED: > > > return (scmd->request->cmd_flags & > > > REQ_FAILFAST_TRANSPORT); > > > case DID_PARITY: > > > return (scmd->request->cmd_flags & REQ_FAILFAST_DEV); > > > > We don't fast fail for that error code to avoid churn for > > transient > > transport problems. The FC and iscsi drivers block the > > rport/session, > > return that code and then it's up the fast_io_fail/replacement > > timeout. > > > > _But_ if prevents that command to be failed over to another path, so > essentially we're blocking execution until fast_io_fail tmo. > > For no good reason as we have other paths available. And if we don't? Not everybody sets queue_if_no_path, right? -Ewan > > Cheers, > > Hannes