This patch set goes on top of Ben's set [1] for github issue 105 [2]. Changes v2 -> v3 (all based on the review by Ben Marzinski): 03/15: See 15/15 below 04/15: replaced the previous patch, which decreased the checker interval to 1 if the map was still inconsistent after reloading, by a warning. It's unlikely that this condition will ever occur, but if it does, we want to know. 08/15: Re-added the call to switch_pathgroups which had been mistakenly deleted. 15/15: New, compensate the fact that it is now more likely to miss the call to enable_group() in update_path_state(). Changes v1 -> v2: Removed patch 3 and 4 of v1 and replaced them by an alternative approach. Instead of allowing map and path removal in the checker loop, the kernel sync is moved towards the end. Patch 5 ff. in v1 contained a bug in the new checker_finished() function. If one "tick" function returned true, the other might not be executed any more. In v2, all tick functions are executed, and the action to be taken is selected according to the combined results. Also, we won't call reload_and_sync_map() when we've already called update_map(). Patch 13 is new in v2. Patch 8 from v1 has been moved after patch 13. (In the thread following the review of v1, I mistakenly wrote about an upcoming "v4" of this patch set. That was wrong, I meant this v2 here). Cover letter of v1: The first patch implements the remark I had on patch 2 on Ben's set, the 2nd is a minor cleanup. Patch 3 moves the sync_mpp() call from the beginning to the end of the checkerloop, as suggested by Ben in [3]. If an inconsitency is found (mpp->need_reload), we reload the map and schedule another sync for the next tick (patch 4). Patch 5 ff. reshuffle the code in checkerloop(). There is now one function, checker_finished(), that takes all actions that need to be done with the vecs lock taken after the checkers have finished. checkerloop() enters this function immediately when the checkers have finished, without dropping and re-acquiring the vecs lock. The map reload logic is completely handled in this function. The various _tick() functions don't loop over mpvec any more; they are now just called for a single mpp, and they simply return true if a map reload is required. The actual reload action differs: if missing_uev_wait_tick() requests a reload, it needs to be a full update_map() (which calls adopt_paths()), whereas in the other cases, reload_and_sync_map() is sufficient. Patch 12 changes the reload action for the ghost delay tick. Patch 13 removes maps if that are not found in the kernel any more. This obsoletes the map garbage collector. Unlike the logic in v1, we won't remove maps on arbitrary error conditions any more. Reviews & comments welcome. Regards Martin [1] https://lore.kernel.org/dm-devel/20241205035638.1218953-1-bmarzins@xxxxxxxxxx/ [2] https://github.com/opensvc/multipath-tools/issues/105 [3] https://lore.kernel.org/dm-devel/Z1iUekRg8sai8HLT@xxxxxxxxxx/ Martin Wilck (15): multipathd: don't reload map in update_mpp_prio() multipathd: remove dm_get_info() call from refresh_multipath() multipathd: sync maps at end of checkerloop multipathd: emit a warning if a map remains inconsistent after reload multipathd: move yielding for waiters to start of checkerloop multipathd: add checker_finished() multipathd: move "tick" calls into checker_finished() multipathd: don't call reload_and_sync_map() from deferred_failback_tick() multipathd: move retry_count_tick() into existing mpvec loop multipathd: don't call update_map() from missing_uev_wait_tick() multipathd: don't call udpate_map() from ghost_delay_tick() multipathd: only call reload_and_sync_map() when ghost delay expires multipathd: remove non-existent maps in checkerloop multipathd: remove mpvec_garbage_collector() multipathd: enable pathgroups in checker_finished() libmultipath/structs.h | 2 +- libmultipath/structs_vec.c | 1 - multipathd/main.c | 317 +++++++++++++++++-------------------- 3 files changed, 149 insertions(+), 171 deletions(-) -- 2.47.1