On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote: > If a multipath device is removed during, or immediately before the > call > to check_path(), multipathd can behave incorrectly. A missing > multpath > device will cause update_multipath_strings() to fail, setting > pp->dmstate to PSTATE_UNDEF. If the path is up, this state will > cause > reinstate_path() to be called, which will also fail. This will > trigger > a reload, restoring the recently removed device. > > If update_multipath_strings() fails because there is no multipath > device, check_path should just quit, since the remove dmevent and > uevent > are likely already queued up. Also, I don't see any reason to reload > the > multipath device if reinstate fails. This code was added by > fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that > reinstate > could fail if the path was disabled. Looking through the current > kernel > code, I can't see any reason why a reinstate would fail, where a > reload > would help. If the path was missing from the multipath device, > update_multipath_strings() would already catch that, and quit > check_path() early, which make more sense to me than reloading does. fac68d7 is related to the famous "dm-multipath: Accept failed paths for multipath maps" patch (e.g. https://patchwork.kernel.org/patch/3368381/#7193001), which never made it upstream. SUSE kernels have shipped this patch for a long time, but we don't apply it any more in recent kernels. With this patch, The reinstate_path() failure would occur if multipathd had created a table with a "disabled" device (one which would be present in a dm map even though the actual block device didn't exist), and then tried to reinstate such a path. It sounds unlikely, but it might be possible if devices are coming and going in quick succession. In that situation, and with the "accept failed path" patch applied, a reload makes some sense, because reload (unlike reinstate) would not fail (at least not for this reason) and would actually add that just- reinstated path. OTOH, it's not likely that the reload would improve matters, either. After all, multipathd is just trying to reinstate a non-existing path. So, I'm fine with skipping the reload. > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> > --- > multipathd/main.c | 36 ++++++++++++++---------------------- > 1 file changed, 14 insertions(+), 22 deletions(-) > > diff --git a/multipathd/main.c b/multipathd/main.c > index 6b7db2c0..8fb73922 100644 > --- a/multipathd/main.c > +++ b/multipathd/main.c > @@ -1611,22 +1611,18 @@ fail_path (struct path * pp, int del_active) > /* > * caller must have locked the path list before calling that > function > */ > -static int > +static void > reinstate_path (struct path * pp) > { > - int ret = 0; > - > if (!pp->mpp) > - return 0; > + return; > > - if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) { > + if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) > condlog(0, "%s: reinstate failed", pp->dev_t); > - ret = 1; > - } else { > + else { > condlog(2, "%s: reinstated", pp->dev_t); > update_queue_mode_add_path(pp->mpp); > } > - return ret; > } > static void > @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct path > * pp, unsigned int ticks) > * Synchronize with kernel state > */ > if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) { > + struct dm_info info; > condlog(1, "%s: Could not synchronize with kernel > state", > pp->dev); > + if (pp->mpp && pp->mpp->alias && > + do_dm_get_info(pp->mpp->alias, &info) == 0) > + /* multipath device missing. Likely removed */ > + return 0; > pp->dmstate = PSTATE_UNDEF; It would be more elegant if we could distinguish different failure modes from update_multipath_strings() directly, without having to call do_dm_get_info() again. > } > /* if update_multipath_strings orphaned the path, quit early */ > @@ -2179,12 +2180,8 @@ check_path (struct vectors * vecs, struct path > * pp, unsigned int ticks) > /* > * reinstate this path > */ > - if (!disable_reinstate && reinstate_path(pp)) { > - condlog(3, "%s: reload map", pp->dev); > - ev_add_path(pp, vecs, 1); > - pp->tick = 1; > - return 0; > - } > + if (!disable_reinstate) > + reinstate_path(pp); > new_path_up = 1; > > if (oldchkrstate != PATH_UP && oldchkrstate != > PATH_GHOST) > @@ -2200,15 +2197,10 @@ check_path (struct vectors * vecs, struct > path * pp, unsigned int ticks) > else if (newstate == PATH_UP || newstate == PATH_GHOST) { > if ((pp->dmstate == PSTATE_FAILED || > pp->dmstate == PSTATE_UNDEF) && > - !disable_reinstate) { > + !disable_reinstate) > /* Clear IO errors */ > - if (reinstate_path(pp)) { > - condlog(3, "%s: reload map", pp->dev); > - ev_add_path(pp, vecs, 1); > - pp->tick = 1; > - return 0; > - } > - } else { > + reinstate_path(pp); > + else { > LOG_MSG(4, verbosity, pp); > if (pp->checkint != max_checkint) { > /* -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Software Solutions Germany GmbH HRB 36809, AG Nürnberg GF: Felix Imendörffer -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel