Re: [PATCH 2/3] multipathd: reload maps in checkerloop() if necessary.

Martin Wilck <mwilck@xxxxxxxx> · Thu, 05 Dec 2024 19:41:28 +0100

On Wed, 2024-12-04 at 22:56 -0500, Benjamin Marzinski wrote:
> update_pathvec_from_dm() may set mpp->need_reload if it finds
> inconsistent settings. In this case, the map should be reloaded, but
> so
> far we don't do this reliably. Add a call to reload_and_sync_map()
> when updating the maps at the end of a loop in checkerloop().
> 
> Suggested-by: Martin Wilck <mwilck@xxxxxxxx>
> Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> ---
>  multipathd/main.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index fcab1ed3..c1ae44f2 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -3038,7 +3038,10 @@ checkerloop (void *ap)
>  							    
> start_time.tv_sec);
>  			if (checker_state == CHECKER_FINISHED) {
>  				vector_foreach_slot(vecs->mpvec,
> mpp, i) {
> -					if (update_mpp_prio(vecs,
> mpp) == 2) {
> +					if (update_mpp_prio(vecs,
> mpp) == 2 ||
> +					    (mpp->need_reload &&
> +					     mpp->synced_count > 0
> &&
> +					    
> reload_and_sync_map(mpp, vecs) == 2)) {
>  						/* multipath device
> deleted */
>  						i--;
> 

The general logic is ok, but as noted on
https://github.com/opensvc/multipath-tools/issues/105), I believe that
do_sync_mpp() is the more natural point in the code to attempt the
reload, but I think it doesn't hurt to do both.

What I don't like so much about this patch is that from this point in
the code, we can call reload_and_sync_map() directly, or indirectly
through update_mpp_prio(). I find that confusing. I think we should
change update_mpp_prio() to just set the need_reload() flag, and then
simply look at the flag in checkerloop(). The "minor" actions in
update_mpp_prio, such as switching path group, can still be done
directly from there.

I can send a patch on top of yours if that's ok with you.

I have some questions. I've spent some time today pulling my hair over
the many functions in the multipath-tools code that are related to 
syncing states. I'm kind of wound up in my own thinking, perhaps you
can help me break out of the loop :-)

1) I observe that update_map() and reload_and_sync_map() are very
similar. They both set up the map from multipathd's internal state,
reload it, and then update multipathd's state. The differences are as
follows:
   - update_map() calls adopt_paths() and verify_paths() before
setup_map(), while reload_map() only calls update_mpp_paths(). 
     - It looks like an oversight to me that we don't call
update_mpp_paths() in update_map(), what do you think?
     - What about verify_paths(), why do we call it only in the
update_map() code path?
   - update_map() retries, while reload_and_sync_map() does not.
   - update_map() calls the persistent-reservation related code.
These differences are few enough I think update_map() could actually
call reload_and_sync_map() as a helper.

2) update_map() is called from missing_uev_wait_tick() and
ghost_delay_tick(), while deferred_failback_tick() and the
update_mpp_prio() code path above "only" call reload_and_sync_map().
Is this intentional? If yes, I am missing the rationale (I believe that
just calling reload_and_sync_map() from all the _tick() functions would
be sufficient, but I may be overlooking something).

I'd love it if we could integrate the various _tick() calls with the
rest of the map-reload logic. Ideally the checkerloop and the tick()
functions would just set the need_reload flag, and then we'd do the
reload once at the end of the loop.

Regards,
Martin