On Sat, Dec 07, 2024 at 12:36:08AM +0100, Martin Wilck wrote: > update_pathvec_from_dm() may set mpp->need_reload if it finds inconsistent > settings. In this case, the map should be reloaded, but so far we don't > do this reliably. A previous patch added a call to reload_and_sync_map() > in the CHECKER_FINISHED state, but in the mean time the checker may have > waited for checker threads to finish, and may have dropped and re-acquired the > vecs lock. As mpp->need_reload is a serious but rare condition, also try > to fix it early in the checker loop. Because of the previous patch, we > can call reload_and_sync_map() here. Again, if the map has any paths in the INIT_PARTIAL or INIT_REMOVED state, the reload will remove them from the pathvec and mess up our looping through it. Reloading maps with need_reload will already happen at the end of this path check loop, so it will still get done very quickly. Also, since we try to get all the paths for a device checked in the same checker loop now, it seems to me that there could a benefit to waiting till the end, to sync our state with the kernel. This could avoid some path ping-ponging in a corner case. If a path failed and the kernel noticed it while we are in a checker loop, but haven't gotten to update_path_state() for that path yet, the path will still appear up to multipathd. In this case, if we reloaded the multipath device while updating an earlier path in the device, we would reinstate this failed path in sync_map_state(), only for us or the kernel to fail it again right afterwards. -Ben > Fixes: https://github.com/opensvc/multipath-tools/issues/105 > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > --- > multipathd/main.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/multipathd/main.c b/multipathd/main.c > index 131dab6..18083c7 100644 > --- a/multipathd/main.c > +++ b/multipathd/main.c > @@ -2450,12 +2450,25 @@ get_new_state(struct path *pp) > static int > do_sync_mpp(struct vectors *vecs, struct multipath *mpp) > { > - int ret; > + const int MAX_RETRIES = 1; > + int ret, retry = 0; > > +try_again: > ret = refresh_multipath(vecs, mpp); > if (ret) > return ret; > > + else if (mpp->need_reload) { > + if (retry++ < MAX_RETRIES) { > + condlog(2, "%s: %s needs reload", __func__, mpp->alias); > + if (reload_and_sync_map(mpp, vecs) == 2) > + /* map removed */ > + return 1; > + goto try_again; > + } else > + condlog(1, "%s: %s still needs reload after %d retries", > + __func__, mpp->alias, retry); > + } > set_no_path_retry(mpp); > return 0; > } > -- > 2.47.0