On Thu, Jun 18, 2020 at 07:34:38PM +0000, Martin Wilck wrote: > On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote: > > If a multipath device is removed during, or immediately before the > > call > > to check_path(), multipathd can behave incorrectly. A missing > > multpath > > device will cause update_multipath_strings() to fail, setting > > pp->dmstate to PSTATE_UNDEF. If the path is up, this state will > > cause > > reinstate_path() to be called, which will also fail. This will > > trigger > > a reload, restoring the recently removed device. > > > > If update_multipath_strings() fails because there is no multipath > > device, check_path should just quit, since the remove dmevent and > > uevent > > are likely already queued up. Also, I don't see any reason to reload > > the > > multipath device if reinstate fails. This code was added by > > fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that > > reinstate > > could fail if the path was disabled. Looking through the current > > kernel > > code, I can't see any reason why a reinstate would fail, where a > > reload > > would help. If the path was missing from the multipath device, > > update_multipath_strings() would already catch that, and quit > > check_path() early, which make more sense to me than reloading does. > > fac68d7 is related to the famous "dm-multipath: Accept failed paths for > multipath maps" patch (e.g. > https://patchwork.kernel.org/patch/3368381/#7193001), which never made > it upstream. SUSE kernels have shipped this patch for a long time, but > we don't apply it any more in recent kernels. > > With this patch, The reinstate_path() failure would occur if multipathd > had created a table with a "disabled" device (one which would be > present in a dm map even though the actual block device didn't exist), > and then tried to reinstate such a path. It sounds unlikely, but it > might be possible if devices are coming and going in quick succession. > In that situation, and with the "accept failed path" patch applied, a > reload makes some sense, because reload (unlike reinstate) would not > fail (at least not for this reason) and would actually add that just- > reinstated path. OTOH, it's not likely that the reload would improve > matters, either. After all, multipathd is just trying to reinstate a > non-existing path. So, I'm fine with skipping the reload. > > > > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> > > --- > > multipathd/main.c | 36 ++++++++++++++---------------------- > > 1 file changed, 14 insertions(+), 22 deletions(-) > > > > diff --git a/multipathd/main.c b/multipathd/main.c > > index 6b7db2c0..8fb73922 100644 > > --- a/multipathd/main.c > > +++ b/multipathd/main.c > > @@ -1611,22 +1611,18 @@ fail_path (struct path * pp, int del_active) > > /* > > * caller must have locked the path list before calling that > > function > > */ > > -static int > > +static void > > reinstate_path (struct path * pp) > > { > > - int ret = 0; > > - > > if (!pp->mpp) > > - return 0; > > + return; > > > > - if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) { > > + if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) > > condlog(0, "%s: reinstate failed", pp->dev_t); > > - ret = 1; > > - } else { > > + else { > > condlog(2, "%s: reinstated", pp->dev_t); > > update_queue_mode_add_path(pp->mpp); > > } > > - return ret; > > } > > static void > > @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct path > > * pp, unsigned int ticks) > > * Synchronize with kernel state > > */ > > if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) { > > + struct dm_info info; > > condlog(1, "%s: Could not synchronize with kernel > > state", > > pp->dev); > > + if (pp->mpp && pp->mpp->alias && > > + do_dm_get_info(pp->mpp->alias, &info) == 0) > > + /* multipath device missing. Likely removed */ > > + return 0; > > pp->dmstate = PSTATE_UNDEF; > > It would be more elegant if we could distinguish different failure > modes from update_multipath_strings() directly, without having to call > do_dm_get_info() again. Seems reasonable. I'll take a look at that. -Ben > > } > > /* if update_multipath_strings orphaned the path, quit early */ > > @@ -2179,12 +2180,8 @@ check_path (struct vectors * vecs, struct path > > * pp, unsigned int ticks) > > /* > > * reinstate this path > > */ > > - if (!disable_reinstate && reinstate_path(pp)) { > > - condlog(3, "%s: reload map", pp->dev); > > - ev_add_path(pp, vecs, 1); > > - pp->tick = 1; > > - return 0; > > - } > > + if (!disable_reinstate) > > + reinstate_path(pp); > > new_path_up = 1; > > > > if (oldchkrstate != PATH_UP && oldchkrstate != > > PATH_GHOST) > > @@ -2200,15 +2197,10 @@ check_path (struct vectors * vecs, struct > > path * pp, unsigned int ticks) > > else if (newstate == PATH_UP || newstate == PATH_GHOST) { > > if ((pp->dmstate == PSTATE_FAILED || > > pp->dmstate == PSTATE_UNDEF) && > > - !disable_reinstate) { > > + !disable_reinstate) > > /* Clear IO errors */ > > - if (reinstate_path(pp)) { > > - condlog(3, "%s: reload map", pp->dev); > > - ev_add_path(pp, vecs, 1); > > - pp->tick = 1; > > - return 0; > > - } > > - } else { > > + reinstate_path(pp); > > + else { > > LOG_MSG(4, verbosity, pp); > > if (pp->checkint != max_checkint) { > > /* > > -- > Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 > SUSE Software Solutions Germany GmbH > HRB 36809, AG Nürnberg GF: Felix > Imendörffer > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel