On Thu, 2024-09-12 at 17:49 -0400, Benjamin Marzinski wrote: > This patchset cleans up some of the ugly code from my last refactor, > and > speeds up multipathd path checking, by refactoring how paths are > checked. > > multipathd previously ran the checker on a path, waited briefly for > it > to complete, and if it did, updated the path and the priorities for > the > multipath device and possibly reloaded it. > > This had multiple issues. > 1. All the paths waited for their checkers to complete serially. This > wasted time with no benefit. It also meant that it wasn't that > uncommon > for a checker to not finish in time, and get marked pending. > > 2. The prioritiers could get run on paths multiple times during a > checker loop if multiple paths were restored at the same time, which > is > a common occurance. > > 3. In an effort to keep a multipath device's state consistent despite > the delays introduced by waiting for the path checkers, paths were > checked by multipath device. However checking the paths could involve > reloading a multipath device table, or even removing a multipath > device. > This added some ugly backout and retry logic to the path check code. > > With this patchset, the code now first starts the path checkers on > all > devices due for a path check. If there are path checkers that need > waiting it then unocks the vecs lock and waits 5ms. Next, it goes > back > and updates the state of all the path devices. Once all of the paths > have been updated, it goes through each multipath device, updating > path > priorities and reloading the device as necessary. > > This allows multipathd to spend less time and do less redundant work > while checking paths, while making paths more likely to not spend a > checker cycle marked as pending. Since multipathd drops the vecs lock > while waiting, uevents and user commands can be processed while its > waiting. > > Since there isn't a delay waiting for the previous checker before > starting the next one, the path checker code has reverted to checking > all paths in pathvec order, instead of by multipath device. Updating > the > paths in pathvec order simplifies the code, since the multipath > devices > can change during path updates. Starting the checkers in pathvec > order > keeps a path from having its checker started towards the end of the > list > of paths but checking for the result towards the start of the list of > paths. > > Changes since v1: > > The original patches have been updated based on suggestions from > Martin > Wilck (patches listed using their original numbering): > > - Moved checker path_state initialization code from 06 to 01 > - Check for NULL before dereference in checker_get_state() in 04 > - Moved adding start_checker to version file from 07 to 06 > - Renamed check_path_state() to get_new_state() in 08. updated 09, > 11, > and 12 to deal with the name change. > - Added new patch before 13 ('fix "fail path" and "reinstate path" > commands') to fix issues with manually failing and reinstating > paths > between when the checkers are run and the paths are updated > - Fixed check in update_paths in 13 > - Split patch 13 into three patches, one to move priority updating > out > of the individual path updating code, and two others to clean up > some > of the logic about updating priorities and path groups. > > After the now 18 patches from the original patchset, four new patches > have been added to move the waiting for pending paths out from > checker_get_state() and into the existing waiting code in > checkerloop(). > Doing the waiting outside of the checkers (in fact, outside of the > vecs lock completely) means that we can't wait till a checker has > completed. We must just pick a time and wait for all of it. On > Martin's > suggestion I picked 5ms. Since multipathd isn't sleeping with the > vecs > lock held, it's much less critical to do this quickly. > > For checker run in pathinfo, I kept the previous timeout of 1ms, but > I > waffled on whether to wait at all here, and I'd be perfectly fine > with > removing the wait altogether, and just let the paths be pending when > called from multipathd with async checkers. > > Also, I've just added these patches to the end of the patchset to > make > it easier to see the changes from v1, but if people want less churn > in > the checker code commits, I'd be find with resending this patchset > with > these patches refactored into the earlier patches. Except for the few minor nits in my previous replies: Reviewed-by: Martin Wilck <mwilck@xxxxxxxx>