On Tue, Dec 10, 2024 at 10:05:14PM +0100, Martin Wilck wrote: > On Tue, 2024-12-10 at 14:02 -0500, Benjamin Marzinski wrote: > > On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote: > > > We previously didn't allow map removal inside the checker loop. But > > > with the late updates to the checkerloop code, it should be safe to > > > orphan > > > paths and delete maps even in this situation. We remove such maps > > > everywhere > > > else in the code already, whenever refresh_multipath() or > > > setup_multipath() > > > is called. > > > > I don't think that this is safe. It's possible that the multipath > > device > > has paths in the INIT_REMOVED or INIT_PARTIAL state. These will get > > silently removed from the pathvec if we remove the map here. This > > will > > mess up our iteration through the pathvec in update_paths(). > > Hm. You're right. But that applies to the current code in 0.11.0 PR as > well, because we'd call > > do_sync_mpp() > update_multipath_strings() > sync_paths() > check_removed_paths() > vector_del_slot(pathvec, i--); > > Or am I missing something? Nope. Your right. Nuts. > It seems to me that the only safe way to handle this is to refrain from > deleting paths from the pathvec anywhere deep in the call stack. Even > if we can avoid this situation now by moving the sync towards the end > of the checker loop, I believe that in the long run we need to fix > these traps in our code, because it's just so easy to get this wrong. > > I wonder if we need yet another path state, of if we could simply set > these entries in the pathvec to NULL. That sounds crazy, but it might > actually be doable. Not 0.11.0 material, though. I think we could just not call check_removed_paths() in sync_paths(). We would still orphan all the paths that were no longer part of the multipath device, and set pp->mpp for all the paths that are part of the device just like before, but we wouldn't delete the paths from pathvec there. Instead we would call check_removed_paths() in refresh_multipath(), so we did it after loads and in update_multipath. I'm pretty sure that should be fine. If the device table changed and removed a path so that we can free it, either we reloaded the device, and we will call setup_multipath() after the reload, or something external did, and multipathd will see an event for that and call setup_multipath() via update_multipath(). Does that make sense? > > Perhaps a > > better idea would be to set mpp->sync_ticks to 0 if > > update_multipath_strings() fails in do_sync_mpp(). This would force a > > refresh by sync_mpp() at the start of the next loop in checkerloop(), > > where it can safely remove the multipath device. > > I like the idea of your other post to move the sync to the > CHECKER_FINISHED state. > > Thanks, > Martin >