Re: [PATCH v2 00/22] Yet Another path checker refactor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2024-09-12 at 17:49 -0400, Benjamin Marzinski wrote:
> This patchset cleans up some of the ugly code from my last refactor,
> and
> speeds up multipathd path checking, by refactoring how paths are
> checked.
> 
> multipathd previously ran the checker on a path, waited briefly for
> it
> to complete, and if it did, updated the path and the priorities for
> the
> multipath device and possibly reloaded it.
> 
> This had multiple issues.
> 1. All the paths waited for their checkers to complete serially. This
> wasted time with no benefit. It also meant that it wasn't that
> uncommon
> for a checker to not finish in time, and get marked pending.
> 
> 2. The prioritiers could get run on paths multiple times during a
> checker loop if multiple paths were restored at the same time, which
> is
> a common occurance.
> 
> 3. In an effort to keep a multipath device's state consistent despite
> the delays introduced by waiting for the path checkers, paths were
> checked by multipath device. However checking the paths could involve
> reloading a multipath device table, or even removing a multipath
> device.
> This added some ugly backout and retry logic to the path check code.
> 
> With this patchset, the code now first starts the path checkers on
> all
> devices due for a path check. If there are path checkers that need
> waiting it then unocks the vecs lock and waits 5ms. Next, it goes
> back
> and updates the state of all the path devices. Once all of the paths
> have been updated, it goes through each multipath device, updating
> path
> priorities and reloading the device as necessary.
> 
> This allows multipathd to spend less time and do less redundant work
> while checking paths, while making paths more likely to not spend a
> checker cycle marked as pending. Since multipathd drops the vecs lock
> while waiting, uevents and user commands can be processed while its
> waiting.
> 
> Since there isn't a delay waiting for the previous checker before
> starting the next one, the path checker code has reverted to checking
> all paths in pathvec order, instead of by multipath device. Updating
> the
> paths in pathvec order simplifies the code, since the multipath
> devices
> can change during path updates. Starting the checkers in pathvec
> order
> keeps a path from having its checker started towards the end of the
> list
> of paths but checking for the result towards the start of the list of
> paths.
> 
> Changes since v1:
> 
> The original patches have been updated based on suggestions from
> Martin
> Wilck (patches listed using their original numbering):
> 
> - Moved checker path_state initialization code from 06 to 01
> - Check for NULL before dereference in checker_get_state() in 04
> - Moved adding start_checker to version file from 07 to 06
> - Renamed check_path_state() to get_new_state() in 08. updated 09,
> 11,
>   and 12 to deal with the name change.
> - Added new patch before 13 ('fix "fail path" and "reinstate path"
>   commands') to fix issues with manually failing and reinstating
> paths
>   between when the checkers are run and the paths are updated
> - Fixed check in update_paths in 13
> - Split patch 13 into three patches, one to move priority updating
> out
>   of the individual path updating code, and two others to clean up
> some
>   of the logic about updating priorities and path groups.
> 
> After the now 18 patches from the original patchset, four new patches
> have been added to move the waiting for pending paths out from
> checker_get_state() and into the existing waiting code in
> checkerloop().
> Doing the waiting outside of the checkers (in fact, outside of the
> vecs lock completely) means that we can't wait till a checker has
> completed. We must just pick a time and wait for all of it. On
> Martin's
> suggestion I picked 5ms. Since multipathd isn't sleeping with the
> vecs
> lock held, it's much less critical to do this quickly.
> 
> For checker run in pathinfo, I kept the previous timeout of 1ms, but
> I
> waffled on whether to wait at all here, and I'd be perfectly fine
> with
> removing the wait altogether, and just let the paths be pending when
> called from multipathd with async checkers.
> 
> Also, I've just added these patches to the end of the patchset to
> make
> it easier to see the changes from v1, but if people want less churn
> in
> the checker code commits, I'd be find with resending this patchset
> with
> these patches refactored into the earlier patches.

Except for the few minor nits in my previous replies:

Reviewed-by: Martin Wilck <mwilck@xxxxxxxx>






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux