Hannes, On Fr, 2021-07-16 at 08:27 +0200, Hannes Reinecke wrote: > On 7/15/21 6:57 PM, Brian Bunker wrote: > > When paths return an ALUA state transition, do not fail those paths. > > The expectation is that the transition is short lived until the new > > ALUA state is entered. There might not be other paths in an online > > state to serve the request which can lead to an unexpected I/O error > > on the multipath device. > > > > Signed-off-by: Brian Bunker <brian@xxxxxxxxxxxxxxx> > > Acked-by: Krishna Kant <krishna.kant@xxxxxxxxxxxxxxx> > > Acked-by: Seamus Connor <sconnor@xxxxxxxxxxxxxxx> > > -- > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > > index bced42f082b0..28948cc481f9 100644 > > --- a/drivers/md/dm-mpath.c > > +++ b/drivers/md/dm-mpath.c > > @@ -1652,12 +1652,12 @@ static int multipath_end_io(struct dm_target > > *ti, struct request *clone, > > if (error && blk_path_error(error)) { > > struct multipath *m = ti->private; > > > > - if (error == BLK_STS_RESOURCE) > > + if (error == BLK_STS_RESOURCE || error == > > BLK_STS_AGAIN) > > r = DM_ENDIO_DELAY_REQUEUE; > > else > > r = DM_ENDIO_REQUEUE; > > > > - if (pgpath) > > + if (pgpath && (error != BLK_STS_AGAIN)) > > fail_path(pgpath); > > > > if (!atomic_read(&m->nr_valid_paths) && > > -- > > Sorry, but this will lead to regressions during failover for arrays > taking longer time (some take up to 30 minutes for a complete > failover). > And for those it's absolutely crucial to _not_ retry I/O on the paths > in > transitioning. This won't happen. As argued in https://marc.info/?l=linux-scsi&m=162625194203635&w=2, your previous patches avoid the request being requeued on the SCSI queue, even with Brian's patch on top. IMO that means that the deadlock situation analyzed earlier can't occur. If you disagree, please explain in detail. By not failing the path, the request can be requeued on the dm level, and dm can decide to try the transitioning path again. But the request wouldn't be added to the SCSI queue, because alua_prep_fn() would prevent that. So, in the worst case, dm would retry queueing the request on the transitioning path over and over again. By adding a delay, we avoid busy-looping. This has basically the same effect as queueing on the dm layer in the first place: the request stays queued on the dm level most of the time. Except for the fact that the queueing would stop earlier: as soon as the kernel notices that the path is not transitioning any more. By not failing the dm paths, we don't depend on user space to reinstate them, which is the right thing to do for a transitioning state IMO. > And you already admitted that 'queue_if_no_path' would resolve this > problem, so why not update the device configuration in multipath- > tools > to have the correct setting for your array? I think Brian is right that setting transitioning paths to failed state in dm is highly questionable. So far, in dm-multipath, we haven't set paths to failed state because of ALUA state transitions. We've been mapping ALUA states to priorities instead. We don't even fail paths in ALUA "unavailable" state, so why do it for "transitioning"? Thanks Martin