On Thu, 2024-12-19 at 16:50 -0500, Benjamin Marzinski wrote: > On Wed, Dec 11, 2024 at 11:58:58PM +0100, Martin Wilck wrote: > > Rather than calling sync_mpp early in the checkerloop and tracking > > map synchronization with synced_count, call sync_mpp() in the > > CHECKER_FINISHED > > state, if either at least one path of the map has been checked in > > the current > > iteration, or the sync tick has expired. This avoids potentially > > deleting > > paths from the pathvec through the do_sync_mpp() -> > > update_multipath_strings() > > -> sync_paths -> check_removed_paths() call chain while we're > > iterating over > > the pathvec. Also, the time gap between obtaining path states and > > syncing > > the state with the kernel is smaller this way. > > > > Sorry for the delayed review. I've been busy lately and there was a > lot > to look at with moving the syncs from before we check the paths till > the > end. Turns out I'm mostly o.k. with this. Syncing right before we > checked the paths still left a small window where things could change > in > the kernel state, and we wouldn't see them before we checked the > paths. > It's just a larger window now. > > The one place where we won't just pick up the change on the next > checker > loop is enable_group(). If the kernel disables a pathgroup, > multipathd > would re-enable it if a path in it switched states to PATH_UP. I'm > not > totally sure how necessary this is. The kenel has code to re-enable > the > pathgroups. But it's been there a while, and perhaps it does avoid an > issue. > > The kernel doesn't even necessarily send an event if it disables a > pathgroup (since the pathgroup isn't actually disabled, it's just > bypassed, which means all the other pathgroups are checked first). > So > there is a real chance that since the last path check, the pgp- > >status > could have changed. By not checking before we check the path, we > could > not realize that tht pgp is in PGSTATE_DISABLED when a path switches > states to PATH_UP. It shouldn't be that hard to check in sync_mpp() > if > there are any disabled path_groups the include a path where > pp->is_checked == CHECK_PATH_NEW_UP. So we could move the > enable_group() > code to checker_finished() as well and fix this. Right. I'll add a patch on top of the series. Thanks for spotting this. Martin