Re: [PATCH 2/2] multipathd: add recheck_wwid_time option to verify the path wwid

Benjamin Marzinski <bmarzins@xxxxxxxxxx> · Wed, 10 Feb 2021 22:48:33 -0600

On Tue, Feb 09, 2021 at 10:19:45PM +0000, Martin Wilck wrote:
> On Mon, 2021-02-08 at 23:19 -0600, Benjamin Marzinski wrote:
> > There are cases where the wwid of a path changes due to LUN remapping
> > without triggering uevent for the changed path. Multipathd has no
> > method
> > for trying to catch these cases, and corruption has resulted because
> > of
> > it.
> > 
> > In order to have a better chance at catching these cases, multipath
> > now
> > has a recheck_wwid_time option, which can either be set to "off" or a
> > number of seconds. If a path is failed for equal to or greater than
> > the
> > configured number of seconds, multipathd will recheck its wwid before
> > restoring it, when the path checker sees that it has come back up.
> 
> Can't the WWID change also happen without the path going offline, or
> at least without being offline long enough that multipathd would
> notice?

Yes. There is no way to guarantee that you won't hit this issue. That's
why this is configurable and disableable. People get to choose how
likely they are to shoot themselves in the foot.

> 
> >  If
> > multipathd notices that a path's wwid has changed it will remove and
> > re-add the path, just like the existing wwid checking code for change
> > events does.  In cases where the no uevent occurs, both the udev
> > database entry and sysfs will have the old wwid, so the only way to
> > get
> > a current wwid is to ask the device directly. 
> 
> sysfs is wrong too, really? In that case I fear triggering an uevent
> won't fix the situation. You need to force the kernel to rescan the
> device, otherwise udev will fetch the WWID from sysfs again, which
> still has the wrong ID... or what am I missing here?

In the reproducer I posted using targetcli and FC devices, sysfs is
wrong.  That does make me a little leary about simply re-adding these
devices as new.  udev will run scsi_id to grab the new WWID and store
that in the udev database, but if we ever fail back to looking at sysfs,
we will still see the old data.  In general, having devices on the
system with bad sysfs data seems dangerous.

> > > Currently multipath only
> > has code to directly get the wwid for scsi devices, so this option
> > only
> > effects scsi devices. Also, since it's possible the the udev wwid
> > won't
> > match the wwid from get_vpd_sgio(), if multipathd doesn't initially
> > see
> > the two values matching for a device, it will disable this option for
> > that device.
> > 
> > If recheck_wwid_time is not turned off, multipathd will also
> > automatically recheck the wwid whenever an existing path gets a add
> > event, or is manually re-added with cli_add_path().
> > 
> > Co-developed-by: Chongyun Wu <wucy11@xxxxxxxxxxxxxxx>
> > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> 
> I am uncertain about this.
> 
> We get one more configuration option that defaults to off and that only
> the truly inaugurated will understand and use. And even those will not
> know how to set the recheck time. Should it be 1s, 10, or 100? We
> already have too many of these options in multipath-tools. We shy away
> from giving users reasonable defaults, with the result that most people
> won't bother.
> 
> I generally don't understand what the UP/DOWN state has to do with
> this. If the WWID can change without any events seen by either the
> kernel or user space, why would the path go down and up again? And even
> if so, why would it matter how long the device remained down?

A new LUN can't get remapped to a still-being-used LUN id. The LUN that
was previously mapped to that id must get unmapped first.  That will
cause IO and and checker commands to fail. So unless a LUN gets unmapped
from a LUN id, and a new one remapped to that id quick enough for no IO
and or checker commands to go to it, the path should go down. In the
case that spurred this development, the path was down for hours before a
new LUN was remapped to that ID.

> But foremost, do we really have to try to deal with configuration
> mistakes as blatant as this? What if a user sets the same WWID for
> different devices, or re-uses the same WWID on different storage
> servers? I already hesitated about the code I added myself for catching
> user errors in the WWIDs file, but this seems even more far-fetched.
> 
> Please convince me.

Err.. An important customer corrupted their data and while they admit
that they were at fault, it's hard for them to guarantee that something
like this won't happen in the future, and they asked if multipath could
do a better job of catching these sorts of mistakes. Obviously this is
more convincing when it's coming from your customer. But the fact still
stands that this has happened to multiple users even with our existing
code to catch this.

Since this isn't a problem that can always be fixed, the best we can do
is try to catch it before something bad happens.  If the path is
remapped before it goes down, then corruption can happen before there
is any possiblity of doing anything. That case is unsolvable. But if
the path does go down when the LUN is unmapped, then there is a chance
to catch this before damage is done.

Now, obviously if you don't set this to 0, then it's possible that the
path gets unmapped and goes down, but comes up before your timeout, and
you don't catch it in time.  Really, this is a dial that nobody that
hasn't got bitten by this issue cares about, but everyone who has had
this happen really wants to be there.

> This said, I'd like to understand why there are no events in these
> cases. I'd have thought we'd at least get a UNIT ATTENTION (REPORTED
> LUNS DATA HAS CHANGED), which would have caused a uevent. If there was
> no UNIT ATTENTION, I'd blame the storage side. 

Are you able to try the reproducer I targetcli FC reproducer I posted?

> Maybe we need to monitor scsi_device uevents.
> 
> Technical remarks below.
> 
> 
> > ---
> >  libmultipath/config.c             |  1 +
> >  libmultipath/config.h             |  1 +
> >  libmultipath/configure.c          |  4 +-
> >  libmultipath/configure.h          |  2 +
> >  libmultipath/defaults.h           |  1 +
> >  libmultipath/dict.c               | 36 ++++++++++++
> >  libmultipath/libmultipath.version |  6 ++
> >  libmultipath/structs.h            | 10 ++++
> >  multipath/multipath.conf.5        | 18 ++++++
> >  multipathd/cli_handlers.c         |  9 +++
> >  multipathd/main.c                 | 92
> > +++++++++++++++++++++++++++++++
> >  multipathd/main.h                 |  2 +
> >  12 files changed, 180 insertions(+), 2 deletions(-)
> >  .
> >  .\" ----------------------------------------------------------------
> > ------------
> > diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> > index 54635738..970d5e21 100644
> > --- a/multipathd/cli_handlers.c
> > +++ b/multipathd/cli_handlers.c
> > @@ -715,6 +715,15 @@ cli_add_path (void * v, char ** reply, int *
> > len, void * data)
> >         pp = find_path_by_dev(vecs->pathvec, param);
> >         if (pp && pp->initialized != INIT_REMOVED) {
> >                 condlog(2, "%s: path already in pathvec", param);
> > +
> > +               if (pp->allow_wwid_recheck == ALLOW_WWID_RECHECK_ON
> > &&
> > +                   check_path_wwid_change(pp)) {
> > +                       condlog(0, "%s: wwid changed. Removing
> > device",
> > +                               pp->dev);
> > +                       handle_path_wwid_change(pp, vecs);
> > +                       return 1;
> > +               }
> > +
> >                 if (pp->mpp)
> >                         return 0;
> >         } else if (pp) {
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 19679848..17eef3a4 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -823,6 +823,59 @@ ev_remove_map (char * devname, char * alias, int
> > minor, struct vectors * vecs)
> >         return flush_map(mpp, vecs, 0);
> >  }
> >  
> > +void
> > +handle_path_wwid_change(struct path *pp, struct vectors *vecs)
> > +{
> > +       const char *action = "add";
> > +       if (!pp || !pp->udev)
> > +               return;
> > +
> > +       sysfs_attr_set_value(pp->udev, "uevent", action,
> > strlen(action));
> > +       trigger_partitions_udev_change(pp->udev, action,
> > strlen(action));
> 
> Nit: it looks a bit weird to use a const char * variable for "action"
> and a constant for "uevent".

It does mean that there's no chance of typo'ing it one of the four times
it's used and not having it be caught, but I'm fine with changing it.

> > +       if (ev_remove_path(pp, vecs, 1) != 0 && pp->mpp) {
> > +               pp->dmstate = PSTATE_FAILED;
> > +               dm_fail_path(pp->mpp->alias, pp->dev_t);
> > +       }
> 
> I suggest taking a ref on pp->udev, calling ev_remove_path(), and
> triggering the uevent after that.
>

vecs locking will save us from handling the uevent before we remove the
path, but your ordering does make things look more obviously correct.
I'm fine with changing it.

> > +}
> > +
> > +bool
> > +check_path_wwid_change(struct path *pp)
> > +{
> > +       char wwid[WWID_SIZE];
> > +       int len = 0;
> > +       char *c;
> > +
> > +       if (!strlen(pp->wwid) || pp->bus != SYSFS_BUS_SCSI)
> > +               return false;
> 
> Maybe you should look at uid_attribute here, to be consistent with
> has_uid_fallback()?

Possibly, be this code will never be run to see if wwid has changed
without first having been run and verifying that the wwids match. If
the wwids don't match the first time, then it's disabled.

> > +
> > +       /* Get the real fresh device wwid by sgio. sysfs still has
> > old
> > +        * data, so only get_vpd_sgio will work to get the new wwid
> > */
> > +       len = get_vpd_sgio(pp->fd, 0x83, 0, wwid, WWID_SIZE);
> > +
> > +       if (len <= 0) {
> > +               condlog(2, "%s: failed to check wwid by sgio: len =
> > %d",
> > +                       pp->dev, len);
> > +               return false;
> > +       }
> > +
> > +       /*Strip any trailing blanks */
> > +       c = strchr(wwid, '\0');
> 
> Quite unusual, why not use strlen() or strnlen()?

This was pulled directly from get_uid(). But I agree it could be cleaned
up in both places

> 
> > +       c--;
> > +       while (c && c >= wwid && *c == ' ') {
> > +               *c = '\0';
> > +               c--;
> > +       }
> 
> Nit: You don't have to nullify every space. Just the first one.
Again, this is just a copy of get_uid().

> 
> > +       condlog(4, "%s: Got wwid %s by sgio", pp->dev, wwid);
> > +
> > +       if (strncmp(wwid, pp->wwid, WWID_SIZE)) {
> > +               condlog(0, "%s: wwid '%s' doesn't match wwid '%s'
> > from device",
> > +                       pp->dev, pp->wwid, wwid);
> > +               return true;
> > +       }
> > +
> > +       return false;
> > +}
> > +
> >  static int
> >  uev_add_path (struct uevent *uev, struct vectors * vecs, int
> > need_do_map)
> >  {
> > @@ -919,6 +972,21 @@ uev_add_path (struct uevent *uev, struct vectors
> > * vecs, int need_do_map)
> >                                         uev->kernel);
> >                                 ret = 1;
> >                         }
> > +               } else if (pp->allow_wwid_recheck ==
> > ALLOW_WWID_RECHECK_ON &&
> > +                          check_path_wwid_change(pp)) {
> > +                       condlog(2, "%s wwid change detected when
> > processing add uevent. removing path", pp->dev);
> > +                       /*
> > +                        * don't use handle_path_wwid_change here,
> > +                        * since that would just trigger another add
> > +                        * uevent
> > +                        */
> > +                       ret = ev_remove_path(pp, vecs, true);
> > +                       if (ret == 0)
> > +                               pp = NULL;
> > +                       else if (pp->mpp) {
> > +                               pp->dmstate = PSTATE_FAILED;
> > +                               dm_fail_path(pp->mpp->alias, pp-
> > >dev_t);
> > +                       }
> 
> What's the purpose of this code? Are you handling your own artificial
> "add" event here, which you triggered before in
> handle_path_wwid_change()? Or are there real cases where the kernel
> would send an "add" event without sending a "remove" event first?
>

This shouldn't be for handling our own add event. Either the
ev_remove_path() in handle_path_wwid_changed() succeeded, and there is
no path in pathvec, or it failed, and pp->initialized should be set to
INIT_REMOVED. Either way, we miss this code path. This is simply if we
get an "add" event without the "remove" event, which is a real thing
that has happened when LUNs get remapped.

> >                 }
> >         }
> >         if (pp)
> > @@ -2049,6 +2117,7 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >         unsigned int checkint, max_checkint;
> >         struct config *conf;
> >         int marginal_pathgroups, marginal_changed = 0;
> > +       int recheck_wwid_time;
> >         int ret;
> >  
> >         if (((pp->initialized == INIT_OK ||
> > @@ -2066,6 +2135,7 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >         checkint = conf->checkint;
> >         max_checkint = conf->max_checkint;
> >         marginal_pathgroups = conf->marginal_pathgroups;
> > +       recheck_wwid_time = conf->recheck_wwid_time;
> >         put_multipath_config(conf);
> >  
> >         if (pp->checkint == CHECKINT_UNDEF) {
> > @@ -2197,6 +2267,26 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >                 return 0;
> >         set_no_path_retry(pp->mpp);
> >  
> > +       if (recheck_wwid_time != RECHECK_WWID_OFF &&
> > +           (newstate == PATH_UP || newstate == PATH_GHOST)) {
> > +               if (pp->allow_wwid_recheck ==
> > ALLOW_WWID_RECHECK_UNSET) {
> > +                       if (check_path_wwid_change(pp))
> > +                               pp->allow_wwid_recheck =
> > ALLOW_WWID_RECHECK_OFF;
> > +                       else
> > +                               pp->allow_wwid_recheck =
> > ALLOW_WWID_RECHECK_ON;
> 
> This is confusing. I pulled my hair over this code before I read your
> man page hunk: "When the path is originally added, if the path's
> configured wwid doesn't match the wwid retrieved directly
> from the scsi device, rechecks will be disabled for the device."
> 
> So, I gather this is an alternative to the has_uid_fallback() logic
> mentioned above. It deserves at least a comment here. Please consider
> if using the same logic as has_uid_falback() isn't just as good as
> this.

I'm fine with using the same logic as has_uid_fallback() to determine if
if we can use get_vpd_sgio().

-Ben

> Btw, as we're already pretty much on corner-case ground here, if the
> path fails quickly after being detected, a WWID change could have
> occured already when it comes UP first, and this code is run for the
> first time.
> 
> 
> > +               } else if (((pp->state != PATH_UP && pp->state !=
> > PATH_GHOST) ||
> > +                           pp->dmstate == PSTATE_FAILED) &&
> > +                          pp->failed_time >= recheck_wwid_time &&
> > +                          pp->allow_wwid_recheck ==
> > ALLOW_WWID_RECHECK_ON &&
> > +                          check_path_wwid_change(pp)) {
> > +                       condlog(0, "%s: path wwid change detected.
> > Removing",
> > +                               pp->dev);
> > +                       handle_path_wwid_change(pp, vecs);
> > +                       return 0;
> > +               }
> > +               pp->failed_time = 0;
> > +       }
> > +
> >         if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
> >             (san_path_check_enabled(pp->mpp) ||
> >              marginal_path_check_enabled(pp->mpp))) {
> > @@ -2330,6 +2420,8 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >                 if (newstate == PATH_DOWN) {
> >                         int log_checker_err;
> >  
> > +                       if (recheck_wwid_time != RECHECK_WWID_OFF)
> > +                               pp->failed_time += pp->checkint;
> >                         conf = get_multipath_config();
> >                         log_checker_err = conf->log_checker_err;
> >                         put_multipath_config(conf);
> > diff --git a/multipathd/main.h b/multipathd/main.h
> > index 5abbe97b..ddd953f9 100644
> > --- a/multipathd/main.h
> > +++ b/multipathd/main.h
> > @@ -50,4 +50,6 @@ int update_multipath (struct vectors *vecs, char
> > *mapname, int reset);
> >  int reload_and_sync_map(struct multipath *mpp, struct vectors *vecs,
> >                         int refresh);
> >  
> > +void handle_path_wwid_change(struct path *pp, struct vectors *vecs);
> > +bool check_path_wwid_change(struct path *pp);
> >  #endif /* MAIN_H */
> 
> -- 
> Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
> SUSE Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix Imendörffer
> 

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel