Re: [PATCH] scsi: dm-mpath: do not fail paths when the target returns ALUA state transition

Brian Bunker <brian@xxxxxxxxxxxxxxx> · Fri, 16 Jul 2021 11:39:59 -0700

On Fri, Jul 16, 2021 at 1:25 AM Martin Wilck <mwilck@xxxxxxxx> wrote:
>
> Hannes,
>
> On Fr, 2021-07-16 at 08:27 +0200, Hannes Reinecke wrote:
> > On 7/15/21 6:57 PM, Brian Bunker wrote:
> > > When paths return an ALUA state transition, do not fail those paths.
> > > The expectation is that the transition is short lived until the new
> > > ALUA state is entered. There might not be other paths in an online
> > > state to serve the request which can lead to an unexpected I/O error
> > > on the multipath device.
> > >
> > > Signed-off-by: Brian Bunker <brian@xxxxxxxxxxxxxxx>
> > > Acked-by: Krishna Kant <krishna.kant@xxxxxxxxxxxxxxx>
> > > Acked-by: Seamus Connor <sconnor@xxxxxxxxxxxxxxx>
> > > --
> > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > > index bced42f082b0..28948cc481f9 100644
> > > --- a/drivers/md/dm-mpath.c
> > > +++ b/drivers/md/dm-mpath.c
> > > @@ -1652,12 +1652,12 @@ static int multipath_end_io(struct dm_target
> > > *ti, struct request *clone,
> > >          if (error && blk_path_error(error)) {
> > >                  struct multipath *m = ti->private;
> > >
> > > -               if (error == BLK_STS_RESOURCE)
> > > +               if (error == BLK_STS_RESOURCE || error ==
> > > BLK_STS_AGAIN)
> > >                          r = DM_ENDIO_DELAY_REQUEUE;
> > >                  else
> > >                          r = DM_ENDIO_REQUEUE;
> > >
> > > -               if (pgpath)
> > > +               if (pgpath && (error != BLK_STS_AGAIN))
> > >                          fail_path(pgpath);
> > >
> > >                  if (!atomic_read(&m->nr_valid_paths) &&
> > > --
> >
> > Sorry, but this will lead to regressions during failover for arrays
> > taking longer time (some take up to 30 minutes for a complete
> > failover).
> > And for those it's absolutely crucial to _not_ retry I/O on the paths
> > in
> > transitioning.
>
> This won't happen.
>
> As argued in https://marc.info/?l=linux-scsi&m=162625194203635&w=2,
> your previous patches avoid the request being requeued on the SCSI
> queue, even with Brian's patch on top. IMO that means that the deadlock
> situation analyzed earlier can't occur. If you disagree, please explain
> in detail.
>
> By not failing the path, the request can be requeued on the dm level,
> and dm can decide to try the transitioning path again. But the request
> wouldn't be added to the SCSI queue, because alua_prep_fn() would
> prevent that.
>
> So, in the worst case, dm would retry queueing the request on the
> transitioning path over and over again. By adding a delay, we avoid
> busy-looping. This has basically the same effect as queueing on the dm
> layer in the first place: the request stays queued on the dm level most
> of the time. Except for the fact that the queueing would stop earlier:
> as soon as the kernel notices that the path is not transitioning any
> more. By not failing the dm paths, we don't depend on user space to
> reinstate them, which is the right thing to do for a transitioning
> state IMO.
>
> > And you already admitted that 'queue_if_no_path' would resolve this
> > problem, so why not update the device configuration in multipath-
> > tools
> > to have the correct setting for your array?

The reason I don't like queue_if_no_path for a solution is that there
are times we do want to tail all of the paths as quickly as possible
(e.g. a cluster sharing a resource). If we add some non zero value to
no_path_retry we would be forcing that configuration to unnecessarily
wait, and those customers will see this delay as a regression. This is
where a distinction between an ALUA state of standby or unavailable
vs. a transition ALUA state is attractive.

>
> I think Brian is right that setting transitioning paths to failed state
> in dm is highly questionable.
>
> So far, in dm-multipath, we haven't set paths to failed state because
> of ALUA state transitions. We've been mapping ALUA states to priorities
> instead. We don't even fail paths in ALUA "unavailable" state, so why
> do it for "transitioning"?
>
> Thanks
> Martin
>
>

Thanks,
Brian

-- 
Brian Bunker
PURE Storage, Inc.
brian@xxxxxxxxxxxxxxx