Re: [PATCH] multipath-tools: prevent a kernel upgrade leading to all paths lost

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This did work, but I agree, it is not the best approach. I was looking
for a backstop in case I could not get the kernel change that was the
real fix to the issue. I wanted to give a solution to our customers
who would upgrade into the problem. Having multitpath-tools dealing
with a fail path that isn't really a fail path is clumsy at best. The
kernel fix got in here:

https://git.kernel.org/mkp/scsi/c/6056a92ceb2a

So this patch is no longer needed.

Thanks,
Brian

On Fri, Apr 29, 2022 at 7:31 PM Benjamin Marzinski <bmarzins@xxxxxxxxxx> wrote:
>
> On Mon, Apr 18, 2022 at 02:54:15PM -0700, Brian Bunker wrote:
> > With the kernel commit 268940b80fa4096397fd0a28e6ad807e64120215, the
> > handling of ALUA state transitioning state changed. It used to be that
> > paths which returned with ALUA transitioning state would not result in
> > those paths being failed. With this patch, that state returned will
> > result in the path being failed. The result of this will result in all
> > paths down events when a configuration has the no_path_retry set to 0.
> > Before this kernel change that same configuration would not reach all
> > paths down.
> >
> > If the kernel is not changed back to the previous behavior, the
> > multipath-tools have to protect against this condition or
> > configurations which were working correctly prior to the kernel change
> > will lead to an unexpected failure.
>
> Unless I'm missing something, this patch isn't going to be helpful. It
> doesn't configure the multipath device to queue IOs before the failure.
> When the kernel runs out of paths, it will see that the mutltipath
> device has no paths and isn't set to queue, and it will fail the IOs.
> After the paths come back up again, multipath will enable queueing in
> update_queue_mode_add_path().  But it's too late by then. The IOs will
> have already been failed.
>
> Another issue is that the patch is doing scsi specific work in code
> meant for general block devices. Finally, It's seems pointless to set
> the device to queue when all the paths go away and the turn off queueing
> when the paths come back. no_path_retry doesn't do anything when there
> are paths, so if it's necessary to set when the paths are gone, it won't
> hurt anything to be set when the paths are present. I understand that
> you only want to do this for devices in the ALUA transtitioning state,
> but the only why to make this actually help is if multipathd enables
> queueing before the device ever gets into that state.
>
> The only really safe thing to do would be to set no_path_retry to
> something other than 0 or NO_PATH_RETRY_QUEUE. This can already be done
> in the configuration files. But I agree that failing in IOs because the
> device is transitioning seems wrong in the first place, and fixing this
> in the kernel makes sense.
>
> > See the kernel discussions here:
> > https://marc.info/?l=linux-scsi&m=162636826305740&w=2
> > https://marc.info/?l=linux-scsi&m=164987222322261&w=2
> >
> > Signed-off-by: brian@xxxxxxxxxxxxxxx
> > Reviewed-by: sconnor@xxxxxxxxxxxxxxx
> > ___
> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > index d94f93a0..0af7e598 100644
> > --- a/libmultipath/structs.h
> > +++ b/libmultipath/structs.h
> > @@ -370,6 +370,7 @@ struct multipath {
> >         int failback_tick;
> >
> >         int no_path_retry; /* number of retries after all paths are down */
> > +       int no_path_retry_configured; /* value in config */
> >         int retry_tick;    /* remaining times for retries */
> >         int disable_queueing;
> >         int minio;
> > diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> > index 6c23df86..3db4e6f7 100644
> > --- a/libmultipath/structs_vec.c
> > +++ b/libmultipath/structs_vec.c
> > @@ -780,8 +780,15 @@ void update_queue_mode_add_path(struct multipath *mpp)
> >  {
> >         int active = count_active_paths(mpp);
> >
> > -       if (active > 0)
> > +       if (active > 0) {
> >                 leave_recovery_mode(mpp);
> > +               if (mpp->no_path_retry != mpp->no_path_retry_configured) {
> > +                       condlog(2, "%s: return no path retry to %d
> > from %d", mpp->alias,
> > +                       mpp->no_path_retry_configured, mpp->no_path_retry);
> > +                       mpp->no_path_retry = mpp->no_path_retry_configured;
> > +               }
> > +       }
> > +
> >         condlog(2, "%s: remaining active paths: %d", mpp->alias, active);
> >  }
> >
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index f2c0b280..92841d13 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -524,7 +524,7 @@ int update_multipath (struct vectors *vecs, char
> > *mapname, int reset)
> >         struct multipath *mpp;
> >         struct pathgroup  *pgp;
> >         struct path *pp;
> > -       int i, j;
> > +       int i, j, tpg;
> >
> >         mpp = find_mp_by_alias(vecs->mpvec, mapname);
> >
> > @@ -553,6 +553,15 @@ int update_multipath (struct vectors *vecs, char
> > *mapname, int reset)
> >                                 checkint = conf->checkint;
> >                                 put_multipath_config(conf);
> >                                 condlog(2, "%s: mark as failed", pp->dev);
> > +                               tpg = get_target_port_group(pp, DEF_TIMEOUT);
> > +                               if ((tpg >= 0) &&
> > +                                   (mpp->no_path_retry == 0) &&
> > +                                   (get_asymmetric_access_state(pp,
> > tpg, DEF_TIMEOUT) == AAS_TRANSITIONING)) {
> > +                                       mpp->no_path_retry_configured
> > = mpp->no_path_retry;
> > +                                       mpp->no_path_retry = (60 / checkint);
> > +                                       condlog(2, "%s: changed %s no
> > path retry to %d", pp->dev, mpp->alias,
> > +                                               (60 / checkint));
> > +                               }
> >                                 mpp->stat_path_failures++;
> >                                 pp->state = PATH_DOWN;
> >                                 if (oldstate == PATH_UP ||
> > ___
> >
> > --
> > Brian Bunker
> > PURE Storage, Inc.
> > brian@xxxxxxxxxxxxxxx
> >
> > --
> > dm-devel mailing list
> > dm-devel@xxxxxxxxxx
> > https://listman.redhat.com/mailman/listinfo/dm-devel
>


-- 
Brian Bunker
PURE Storage, Inc.
brian@xxxxxxxxxxxxxxx

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux