On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote: > We previously didn't allow map removal inside the checker loop. But > with the late updates to the checkerloop code, it should be safe to orphan > paths and delete maps even in this situation. We remove such maps everywhere > else in the code already, whenever refresh_multipath() or setup_multipath() > is called. Actually, thinking about this more, what do we get by proactively deleting the multipath device if something goes wrong in the checker? If we successfully reload a device, but can't sync it with the kernel, that's one thing, But that was triggered by a change in the device, and we know that when we reloaded the device, device-mapper was working. I'm leery of possibly deleting the map because of a transient device-mapper issue. I'm not sure if on a check that we do repeatedly, we should delete the device on an error. We haven't in the past, and as far as I know, it doesn't cause problems. Without a benefit to doing this, I'm not sure it makes sense. -Ben > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > --- > multipathd/main.c | 43 ++++++++++++++++++++----------------------- > 1 file changed, 20 insertions(+), 23 deletions(-) > > diff --git a/multipathd/main.c b/multipathd/main.c > index 4a28fbb..131dab6 100644 > --- a/multipathd/main.c > +++ b/multipathd/main.c > @@ -2446,34 +2446,30 @@ get_new_state(struct path *pp) > return newstate; > } > > -static void > -do_sync_mpp(struct vectors * vecs, struct multipath *mpp) > +/* Returns true if the mpp was deleted */ > +static int > +do_sync_mpp(struct vectors *vecs, struct multipath *mpp) > { > - int i, ret; > - struct path *pp; > + int ret; > + > + ret = refresh_multipath(vecs, mpp); > + if (ret) > + return ret; > > - ret = update_multipath_strings(mpp, vecs->pathvec); > - if (ret != DMP_OK) { > - condlog(1, "%s: %s", mpp->alias, ret == DMP_NOT_FOUND ? > - "device not found" : > - "couldn't synchronize with kernel state"); > - vector_foreach_slot (mpp->paths, pp, i) > - pp->dmstate = PSTATE_UNDEF; > - return; > - } > set_no_path_retry(mpp); > + return 0; > } > > -static void > +static int > sync_mpp(struct vectors * vecs, struct multipath *mpp, unsigned int ticks) > { > if (mpp->sync_tick) > mpp->sync_tick -= (mpp->sync_tick > ticks) ? ticks : > mpp->sync_tick; > if (mpp->sync_tick) > - return; > + return 0; > > - do_sync_mpp(vecs, mpp); > + return do_sync_mpp(vecs, mpp); > } > > static int > @@ -2513,12 +2509,10 @@ update_path_state (struct vectors * vecs, struct path * pp) > return handle_path_wwid_change(pp, vecs)? CHECK_PATH_REMOVED : > CHECK_PATH_SKIPPED; > } > - if (pp->mpp->synced_count == 0) { > - do_sync_mpp(vecs, pp->mpp); > + if (pp->mpp->synced_count == 0 && do_sync_mpp(vecs, pp->mpp)) > /* if update_multipath_strings orphaned the path, quit early */ > - if (!pp->mpp) > - return CHECK_PATH_SKIPPED; > - } > + return CHECK_PATH_SKIPPED; > + > if ((newstate != PATH_UP && newstate != PATH_GHOST && > newstate != PATH_PENDING) && (pp->state == PATH_DELAYED)) { > /* If path state become failed again cancel path delay state */ > @@ -3018,8 +3012,11 @@ checkerloop (void *ap) > mpp->synced_count = 0; > if (checker_state == CHECKER_STARTING) { > vector_foreach_slot(vecs->mpvec, mpp, i) { > - sync_mpp(vecs, mpp, ticks); > - mpp->prio_update = PRIO_UPDATE_NONE; > + if (sync_mpp(vecs, mpp, ticks)) > + /* map deleted */ > + i--; > + else > + mpp->prio_update = PRIO_UPDATE_NONE; > } > vector_foreach_slot(vecs->pathvec, pp, i) > pp->is_checked = CHECK_PATH_UNCHECKED; > -- > 2.47.0