On Wed, 2024-12-04 at 22:56 -0500, Benjamin Marzinski wrote: > update_pathvec_from_dm() may set mpp->need_reload if it finds > inconsistent settings. In this case, the map should be reloaded, but > so > far we don't do this reliably. Add a call to reload_and_sync_map() > when updating the maps at the end of a loop in checkerloop(). > > Suggested-by: Martin Wilck <mwilck@xxxxxxxx> > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> > --- > multipathd/main.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/multipathd/main.c b/multipathd/main.c > index fcab1ed3..c1ae44f2 100644 > --- a/multipathd/main.c > +++ b/multipathd/main.c > @@ -3038,7 +3038,10 @@ checkerloop (void *ap) > > start_time.tv_sec); > if (checker_state == CHECKER_FINISHED) { > vector_foreach_slot(vecs->mpvec, > mpp, i) { > - if (update_mpp_prio(vecs, > mpp) == 2) { > + if (update_mpp_prio(vecs, > mpp) == 2 || > + (mpp->need_reload && > + mpp->synced_count > 0 > && > + > reload_and_sync_map(mpp, vecs) == 2)) { > /* multipath device > deleted */ > i--; > The general logic is ok, but as noted on https://github.com/opensvc/multipath-tools/issues/105), I believe that do_sync_mpp() is the more natural point in the code to attempt the reload, but I think it doesn't hurt to do both. What I don't like so much about this patch is that from this point in the code, we can call reload_and_sync_map() directly, or indirectly through update_mpp_prio(). I find that confusing. I think we should change update_mpp_prio() to just set the need_reload() flag, and then simply look at the flag in checkerloop(). The "minor" actions in update_mpp_prio, such as switching path group, can still be done directly from there. I can send a patch on top of yours if that's ok with you. I have some questions. I've spent some time today pulling my hair over the many functions in the multipath-tools code that are related to syncing states. I'm kind of wound up in my own thinking, perhaps you can help me break out of the loop :-) 1) I observe that update_map() and reload_and_sync_map() are very similar. They both set up the map from multipathd's internal state, reload it, and then update multipathd's state. The differences are as follows: - update_map() calls adopt_paths() and verify_paths() before setup_map(), while reload_map() only calls update_mpp_paths(). - It looks like an oversight to me that we don't call update_mpp_paths() in update_map(), what do you think? - What about verify_paths(), why do we call it only in the update_map() code path? - update_map() retries, while reload_and_sync_map() does not. - update_map() calls the persistent-reservation related code. These differences are few enough I think update_map() could actually call reload_and_sync_map() as a helper. 2) update_map() is called from missing_uev_wait_tick() and ghost_delay_tick(), while deferred_failback_tick() and the update_mpp_prio() code path above "only" call reload_and_sync_map(). Is this intentional? If yes, I am missing the rationale (I believe that just calling reload_and_sync_map() from all the _tick() functions would be sufficient, but I may be overlooking something). I'd love it if we could integrate the various _tick() calls with the rest of the map-reload logic. Ideally the checkerloop and the tick() functions would just set the need_reload flag, and then we'd do the reload once at the end of the loop. Regards, Martin