On Tue, 2024-12-10 at 14:02 -0500, Benjamin Marzinski wrote: > On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote: > > We previously didn't allow map removal inside the checker loop. But > > with the late updates to the checkerloop code, it should be safe to > > orphan > > paths and delete maps even in this situation. We remove such maps > > everywhere > > else in the code already, whenever refresh_multipath() or > > setup_multipath() > > is called. > > I don't think that this is safe. It's possible that the multipath > device > has paths in the INIT_REMOVED or INIT_PARTIAL state. These will get > silently removed from the pathvec if we remove the map here. This > will > mess up our iteration through the pathvec in update_paths(). Hm. You're right. But that applies to the current code in 0.11.0 PR as well, because we'd call do_sync_mpp() update_multipath_strings() sync_paths() check_removed_paths() vector_del_slot(pathvec, i--); Or am I missing something? It seems to me that the only safe way to handle this is to refrain from deleting paths from the pathvec anywhere deep in the call stack. Even if we can avoid this situation now by moving the sync towards the end of the checker loop, I believe that in the long run we need to fix these traps in our code, because it's just so easy to get this wrong. I wonder if we need yet another path state, of if we could simply set these entries in the pathvec to NULL. That sounds crazy, but it might actually be doable. Not 0.11.0 material, though. > Perhaps a > better idea would be to set mpp->sync_ticks to 0 if > update_multipath_strings() fails in do_sync_mpp(). This would force a > refresh by sync_mpp() at the start of the next loop in checkerloop(), > where it can safely remove the multipath device. I like the idea of your other post to move the sync to the CHECKER_FINISHED state. Thanks, Martin