On 1/28/21 12:01 PM, Mike Christie wrote: > On 1/28/21 12:57 AM, Hannes Reinecke wrote: >> On 1/28/21 1:51 AM, michael.christie@xxxxxxxxxx wrote: >>> On 1/26/21 7:02 AM, Hannes Reinecke wrote: >>>> When a command is return with DID_TRANSPORT_DISRUPTED we should >>>> be looking at the REQ_FAILFAST_TRANSPORT flag and do not retry >>>> the command if set. >>>> Otherwise multipath will be requeuing a command on the failed >>>> path and not fail it over to one of the working paths. >>>> >>>> Cc: Martin Wilck <martin.wilck@xxxxxxxx> >>>> Signed-off-by: Hannes Reinecke <hare@xxxxxxxx> >>>> --- >>>> drivers/scsi/scsi_error.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >>>> index a52665eaf288..005118385b70 100644 >>>> --- a/drivers/scsi/scsi_error.c >>>> +++ b/drivers/scsi/scsi_error.c >>>> @@ -1753,6 +1753,7 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd) >>>> case DID_TIME_OUT: >>>> goto check_type; >>>> case DID_BUS_BUSY: >>>> + case DID_TRANSPORT_DISRUPTED: >>>> return (scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT); >>>> case DID_PARITY: >>>> return (scmd->request->cmd_flags & REQ_FAILFAST_DEV); >>> >>> We don't fast fail for that error code to avoid churn for transient transport problems. The FC and iscsi drivers block the rport/session, return that code and then it's up the fast_io_fail/replacement timeout. >>> >> _But_ if prevents that command to be failed over to another path, so essentially we're blocking execution until fast_io_fail tmo. >> >> For no good reason as we have other paths available. > > It is for a good reason. Causing a failover to another path can lead > to other resources being shifted and putting the system out of > balance. For active/active it's most likely fine, but for active > passive targets and it might be better to wait a second or two to see > if we can recover the optimal/preferred path. > > For iscsi if the user has set fast_io_fail tmo > 0 then they want the > delayed behavior. If the user wants it failed immediately they set > fast_io_tmo (iscsi replacement timeout) to zero. Note for fc you would > need to add code to do the same since zero means off. > > The other issue is that your patch only solves part of the problem. > It would prevent the IO that went through the above code path and new IO > from waiting for fast_io_fail. It does not handle the other IO that would > be caught between dm-multipath and the driver when the rport/session block > completes/starts. In the end the app is probably going to get stuck waiting > for fast io fail tmo to fire. > > I think for both reasons, you want to just make it so the user can set > fast io fail tmo in a way that some value means fail immediately or do > something completely new. For that last part, I think in the short term we could just fix up csi_transport_fc to work like iscsi, but for the longer term to handle cases like you have N>1 paths in a multipath group and are doing active/active across them but then have another group where you have to do active/passive over, then we could do something new. I think we have what we have now, because back when it was made we couldn't tell dm-multipath what the error was, so we had to handle it in both layers like this. Now, we could fail immediately, not block the queues so IO would not get stuck in there, and then dm-multipath and multipathd could do something smart with the error since they know the reason it was failed, and the path and path group layout, and have more info on the device like if it is active passive. For compat and to support what both of us need, we could have a timeout/setting for paths within a path group and one for switching path groups. We could also do different behaviors of the error was DID_TRANSPORT_DISRUPTED vs DID_TRANSPORT_FAILFAST since for the latter the transport timers have fired.