On Tue, 2024-12-10 at 17:49 -0500, Benjamin Marzinski wrote: > On Tue, Dec 10, 2024 at 10:05:14PM +0100, Martin Wilck wrote: > > On Tue, 2024-12-10 at 14:02 -0500, Benjamin Marzinski wrote: > > > On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote: > > > > We previously didn't allow map removal inside the checker loop. > > > > But > > > > with the late updates to the checkerloop code, it should be > > > > safe to > > > > orphan > > > > paths and delete maps even in this situation. We remove such > > > > maps > > > > everywhere > > > > else in the code already, whenever refresh_multipath() or > > > > setup_multipath() > > > > is called. > > > > > > I don't think that this is safe. It's possible that the multipath > > > device > > > has paths in the INIT_REMOVED or INIT_PARTIAL state. These will > > > get > > > silently removed from the pathvec if we remove the map here. This > > > will > > > mess up our iteration through the pathvec in update_paths(). > > > > Hm. You're right. But that applies to the current code in 0.11.0 PR > > as > > well, because we'd call > > > > do_sync_mpp() > > update_multipath_strings() > > sync_paths() > > check_removed_paths() > > vector_del_slot(pathvec, i--); > > > > Or am I missing something? > > Nope. Your right. Nuts. > > > It seems to me that the only safe way to handle this is to refrain > > from > > deleting paths from the pathvec anywhere deep in the call stack. > > Even > > if we can avoid this situation now by moving the sync towards the > > end > > of the checker loop, I believe that in the long run we need to fix > > these traps in our code, because it's just so easy to get this > > wrong. > > > > I wonder if we need yet another path state, of if we could simply > > set > > these entries in the pathvec to NULL. That sounds crazy, but it > > might > > actually be doable. Not 0.11.0 material, though. > > I think we could just not call check_removed_paths() in sync_paths(). > We > would still orphan all the paths that were no longer part of the > multipath device, and set pp->mpp for all the paths that are part of > the > device just like before, but we wouldn't delete the paths from > pathvec there. Instead we would call check_removed_paths() in > refresh_multipath(), so we did it after loads and in > update_multipath. > > I'm pretty sure that should be fine. If the device table changed and > removed a path so that we can free it, either we reloaded the device, > and we will call setup_multipath() after the reload, or something > external did, and multipathd will see an event for that and call > setup_multipath() via update_multipath(). > > Does that make sense? I think we'll be fine with my upcoming patch set, which will call reload_and_sync_map() only from checker_finished(). I don't change sync_paths() in this set so far. I'm a little concerned about refresh_multipath() and reload_and_sync_map() being called from various CLI functions. But I won't start digging into that now. Maps may get removed in CLI calls, so what. Thanks, Martin