Martin and Hannes, Any resolution to this issue? Thanks, Brian On Fri, Jul 16, 2021 at 11:39 AM Brian Bunker <brian@xxxxxxxxxxxxxxx> wrote: > > On Fri, Jul 16, 2021 at 1:25 AM Martin Wilck <mwilck@xxxxxxxx> wrote: > > > > Hannes, > > > > On Fr, 2021-07-16 at 08:27 +0200, Hannes Reinecke wrote: > > > On 7/15/21 6:57 PM, Brian Bunker wrote: > > > > When paths return an ALUA state transition, do not fail those paths. > > > > The expectation is that the transition is short lived until the new > > > > ALUA state is entered. There might not be other paths in an online > > > > state to serve the request which can lead to an unexpected I/O error > > > > on the multipath device. > > > > > > > > Signed-off-by: Brian Bunker <brian@xxxxxxxxxxxxxxx> > > > > Acked-by: Krishna Kant <krishna.kant@xxxxxxxxxxxxxxx> > > > > Acked-by: Seamus Connor <sconnor@xxxxxxxxxxxxxxx> > > > > -- > > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > > > > index bced42f082b0..28948cc481f9 100644 > > > > --- a/drivers/md/dm-mpath.c > > > > +++ b/drivers/md/dm-mpath.c > > > > @@ -1652,12 +1652,12 @@ static int multipath_end_io(struct dm_target > > > > *ti, struct request *clone, > > > > if (error && blk_path_error(error)) { > > > > struct multipath *m = ti->private; > > > > > > > > - if (error == BLK_STS_RESOURCE) > > > > + if (error == BLK_STS_RESOURCE || error == > > > > BLK_STS_AGAIN) > > > > r = DM_ENDIO_DELAY_REQUEUE; > > > > else > > > > r = DM_ENDIO_REQUEUE; > > > > > > > > - if (pgpath) > > > > + if (pgpath && (error != BLK_STS_AGAIN)) > > > > fail_path(pgpath); > > > > > > > > if (!atomic_read(&m->nr_valid_paths) && > > > > -- > > > > > > Sorry, but this will lead to regressions during failover for arrays > > > taking longer time (some take up to 30 minutes for a complete > > > failover). > > > And for those it's absolutely crucial to _not_ retry I/O on the paths > > > in > > > transitioning. > > > > This won't happen. > > > > As argued in https://marc.info/?l=linux-scsi&m=162625194203635&w=2, > > your previous patches avoid the request being requeued on the SCSI > > queue, even with Brian's patch on top. IMO that means that the deadlock > > situation analyzed earlier can't occur. If you disagree, please explain > > in detail. > > > > By not failing the path, the request can be requeued on the dm level, > > and dm can decide to try the transitioning path again. But the request > > wouldn't be added to the SCSI queue, because alua_prep_fn() would > > prevent that. > > > > So, in the worst case, dm would retry queueing the request on the > > transitioning path over and over again. By adding a delay, we avoid > > busy-looping. This has basically the same effect as queueing on the dm > > layer in the first place: the request stays queued on the dm level most > > of the time. Except for the fact that the queueing would stop earlier: > > as soon as the kernel notices that the path is not transitioning any > > more. By not failing the dm paths, we don't depend on user space to > > reinstate them, which is the right thing to do for a transitioning > > state IMO. > > > > > And you already admitted that 'queue_if_no_path' would resolve this > > > problem, so why not update the device configuration in multipath- > > > tools > > > to have the correct setting for your array? > > The reason I don't like queue_if_no_path for a solution is that there > are times we do want to tail all of the paths as quickly as possible > (e.g. a cluster sharing a resource). If we add some non zero value to > no_path_retry we would be forcing that configuration to unnecessarily > wait, and those customers will see this delay as a regression. This is > where a distinction between an ALUA state of standby or unavailable > vs. a transition ALUA state is attractive. > > > > > I think Brian is right that setting transitioning paths to failed state > > in dm is highly questionable. > > > > So far, in dm-multipath, we haven't set paths to failed state because > > of ALUA state transitions. We've been mapping ALUA states to priorities > > instead. We don't even fail paths in ALUA "unavailable" state, so why > > do it for "transitioning"? > > > > Thanks > > Martin > > > > > > Thanks, > Brian > > -- > Brian Bunker > PURE Storage, Inc. > brian@xxxxxxxxxxxxxxx -- Brian Bunker PURE Storage, Inc. brian@xxxxxxxxxxxxxxx