Re: [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 02, 2024 at 09:34:17AM +0200, Martin Wilck wrote:
> On Mon, 2024-07-01 at 14:41 -0400, Benjamin Marzinski wrote:
> > 
> > 
> > 
> > My purpose in writing 0f850db wasn't to cut down on ioctls. It was to
> > keep set_no_path_retry() from incorrectly handling queueing in cases
> > where no_path_retry was set to a number, and the device had no usable
> > paths.  My purpose in this patch is also to make sure
> > set_no_path_retry() does the right thing. Obviously, this one is more
> > of
> > a corner case, and if instead we were occasionally resyncing with the
> > kernel even when a multipath device had no paths, that would solve
> > this
> > issue, and I'd be fine with dropping this patch.
> 
> That sounds good.
> 
> > So unless it conflicts with work you're already doing, I'll send a
> > new
> > patchset that changes checkerloop() to pull the multipath device
> > updating work out of check_path(). I'm not sure that it makes a lot
> > of
> > sense to change how we loop through the paths, since check_path()
> > does
> > work on some paths that aren't assigned to a multipath device. 
> > Plus
> > different paths in a multipath device will need checking at different
> > times depending on their states anyway, so you can't just go through
> > all
> > the paths at once.
> 
> While I agree that syncing the map state just once per checkerloop and
> map is the right thing, I believe we should do it as soon as we have
> checked all paths that need to be checked for the given map. And that's
> most easily done by looping over the mpath vector, like this:
> 
> for each mpp
>     for each path in mpp
>         if path needs to be checked:
>             check path
>     sync mpp

Huh? Are you suggesting syncing the mpp state every second? That would
be an increase in the number of ioctls we're doing in many cases. For
instance, a 4 path device with all paths working and a default
configuration will sync the map 4 times for every 20 seconds.
Unfortunately, it's possible that all these syncs would happend in a
clump and then we'd wait 20 seconds to rapidly sync 4 times again.  An 8
path device with all paths down would sync 8 times every 5 seconds. In
this case syncing once a second would be less work, but in general this
looks like multipathd doing more ioctls. 

The primary purpose of these periodic syncs is to protect us from
external changes that don't trigger a dm event. This is a rare enough
thing that syncing every max_checkint seems often enough. So is "sync
mpp" just shorthand for "sync mpp if it needs to be synced", or is there
some benefit that I'm overlooking to sync every second?

> 
> for each path
>     if (!pp->mpp and path needs to be checked)
>           check path
> 
> Of course we could do this in multiple steps and move the mpp sync out
> of the path loop first, like you suggested. But if we do this, there
> may be a considerable time delay between checking the paths of a given
> map and syncing the related mpp state, especially on systems with a
> large number of devices, and that might be problematic.
>
> Path status changes are asynchronous by nature, and thus whatever we do
> will not be perfect. IMO it makes sense to do all checks related to a
> given map in as short a time interval as possible. This way we are most
> likely to get a consistent picture of the current state of the map in
> question in the kernel.

I'm not against your idea, but the way to code is now, even doing your
mpp loop doesn't really guarantee that. Just because all the paths get
checked every 20 seconds, doesn't mean that they will be ready to be
checked on the same second. For paths that get added after the multipath
device is created, this is often not the case. And even if the paths
start in sync, there are multiple reasons why they can drop out of sync.

But it we wanted to, we probably could fix that and space the checkers
out better at the same time. I don't think it would be that hard to make
the paths change their next check tick by a second every so often so that
all the paths in the same state from a given multipath device run their
checkers at the same time. We could even make the different multipath
devices spread out when they are checked, so that they aren't trying to
all run their checkers at once.

I'll take a stab at this in my updated patchset.

-Ben

> In the long run, I think what we should do is use asynchronous checkers
> exclusively. And instead of waiting a millisecond for them to complete
> synchronously, we should trigger the path check on all paths (that need
> to be checked) in one go, then wait for all checkers to complete (this
> is the difficult part, I know), and then go over all the results, again
> without waiting for anything.
> 
> Another project I've been wanting to do for a long time :-/
> 
> Regards
> Martin
> 





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux