Re: [PATCH] multipathd: Make sure to disable queueing if recovery has failed.

Benjamin Marzinski <bmarzins@xxxxxxxxxx> · Tue, 5 Dec 2023 14:28:22 -0500

On Tue, Dec 05, 2023 at 03:57:13PM +0100, Martin Wilck wrote:
> On Mon, 2023-12-04 at 18:01 -0500, Benjamin Marzinski wrote:
> > On Mon, Dec 04, 2023 at 09:43:58PM +0100, Martin Wilck wrote:
> > > On Mon, 2023-12-04 at 15:00 -0500, Benjamin Marzinski wrote:
> > > > On Mon, Dec 04, 2023 at 03:38:03PM +0100, Martin Wilck wrote:
> > > > > On Mon, 2023-11-27 at 16:54 -0500, Benjamin Marzinski wrote:
> > > > > > If a multipath device has no_path_retry set to a number and
> > > > > > has
> > > > > > lost
> > > > > > all
> > > > > > paths, gone into recovery mode, and timed out, it will
> > > > > > disable
> > > > > > queue_if_no_paths. After that, if one of those failed paths
> > > > > > is
> > > > > > removed,
> > > > > > when the device is reloaded, queue_if_no_paths will be re-
> > > > > > enabled. 
> > > > > > When
> > > > > > set_no_path_retry() is then called to update the queueing
> > > > > > state,
> > > > > > it
> > > > > > will
> > > > > > not disable queue_if_no_paths, since the device is still in
> > > > > > the
> > > > > > recovery
> > > > > > state, so it believes no work needs to be done. The device
> > > > > > will
> > > > > > remain
> > > > > > in the recovery state, with retry_ticks at 0, and queueing
> > > > > > enabled,
> > > > > > even though there are no usable paths.
> > > > > > 
> > > > > > To fix this, in set_no_path_retry(), if no_path_retry is set
> > > > > > to a
> > > > > > number
> > > > > > and the device is queueing but it is in recovery mode and out
> > > > > > of
> > > > > > retries with no usable paths, manually disable
> > > > > > queue_if_no_path.
> > > > > > 
> > > > > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> > > > > 
> > > > > Thanks, this makes sense. I am just wondering if this logic
> > > > > should
> > > > > be moved to enter_recovery_mode() / leave_recovery_mode()
> > > > > instead.
> > > > > That pair of functions is also called from
> > > > > update_queue_mode{add,del}_path(), where the same reasoning
> > > > > would
> > > > > apply, afaics. Am I overlooking something?
> > > > 
> > > > In general, I think it's safe to only do this in
> > > > set_no_path_retry().
> > > > The functions that call update_queue_mode{add,del}_path() are:
> > > > fail_path()
> > > > reinstate_path()
> > > > update_multipath()
> > > > io_err_stat_handle_pathfail()
> > > > 
> > > > fail_path() and reinstate_path() will always call the
> > > > update_queue_mode_* function after calling set_no_path_retry(),
> > > > so
> > > > the
> > > > device will already be in the correct state. update_multipath()
> > > > usually
> > > > will as well (except when it's called by a command line handler),
> > > > and
> > > > further, it will only call update_queue_mode_del_path() when it
> > > > is
> > > > actually failing a path, so the multipath device shouldn't
> > > > already be
> > > > in recovery_mode. The same goes for
> > > > io_err_stat_handle_pathfail().
> > > > 
> > > > Also we need to make sure that mpp->features is uptodate before
> > > > doing
> > > > these checks, so that we know what the current queueing state is.
> > > > io_err_stat_handle_pathfail() doesn't update mpp->features before
> > > > calling update_queue_mode_del_path(), so it could end up doing
> > > > the
> > > > wrong
> > > > thing.
> > > > 
> > > > And set_no_path_retry() will be called (after updating mpp-
> > > > >features)
> > > > the next time the path checker completes for a path in the
> > > > device, so
> > > > devices won't remain in a bad state long, no matter what.
> > > > 
> > > > What looking at this did make me realize is that
> > > > cli_restore_queueing()
> > > > and cli_restore_all_queueing() don't update the device strings
> > > > before
> > > > calling set_no_path_retry(). Thinking about this, it seems like
> > > > they
> > > > should call update_multipath(vecs, mpp->alias, 1) instead, to
> > > > trigger
> > > > the set_no_path_retry() call after updating mpp->features.
> > > > 
> > > > I'll send another patch to fix cli_restore_queueing() and
> > > > cli_restore_all_queueing(). If you think it's necessary, I can
> > > > also
> > > > send
> > > > one to make io_err_stat_handle_pathfail() call update_multipath()
> > > > and
> > > > then move the code to manually change the queueing state to
> > > > enter_recovery_mode() and leave_recovery_mode(), instead of doing
> > > > it
> > > > in
> > > > set_no_path_retry().
> > > > 
> > > 
> > > Thanks for working out all the different cases. Personally, I would
> > > like to have this logic in one place, if possible. I'll leave it to
> > > your judgement, as you've taken a closer look than myself.
> > > 
> > > I'm not sure what you mean with "mpp->features is up to date"; are
> > > you
> > > considering some entity applying changes to the mpp features
> > > without
> > > involving multipathd, e.g. by using direct DM ioctls or dmsetup?
> > 
> > Or right after multipathd calls dm_queue_if_no_path(). We don't
> > immediately update mpp->features to reflect the changed
> > queue_if_no_paths setting of the device. So there are windows after
> > someone, including multipathd, has changed the queue_if_no_paths
> > setting
> > of the device, during which checking mpp->features won't give the
> > correct answer unless we first reread the device table. Obviously, if
> > something other than multipathd has changed this, then there's always
> > the possiblity of a race, but we can make sure that we don't race
> > with
> > ourselves.
> > 
> > For instance, the way things are right now, running:
> > 
> > # multipathd disablequeueing maps; multipathd restorequeueing maps
> > 
> > will occasionally leave the devices not queueing until the next time
> > a
> > path in them is checked, and set_no_path_retry actually restores
> > queueing.  But I'm actually going to deal with this differently. I'll
> > send a patchset shortly.
> 
> You are right, I see. It's actually an inconsistency that I would like
> to get rid of. I suggest that we ditch mpp->features entirely, and just
> keep track of the map's features using an abstract representation using
> booleans and ints. We should never apply kernel status changes that are
> inconsistent with our own representation of the map's features, IMO.

I don't agree with ditching it entirely. The existance of mpp->features
allows people to use a kernel feature with a version of the userspace
tools that wasn't doesn't know about it. This means that if a new
feature is added to the kernel, the existing multipath userspace tools
can be used to test it.

We need to provide users the option of setting features, and for
backwards compatibility, we need to accept strings in the feature line
that we also have other options to set. And we still need to build up a
features line to pass into the kernel. So we will still need the code to
reconcile the features line with other options.  I'm not sure we would
gain much by converting it to a set of variables and then back again.

The one case where we really need to track our current value separate
from the features line is queue_if_no_path, and it does make sense to
track that as a variable, so we can easily update it when we make
changes to it. But even there, the current value is not necessarily the
value we want to put in the features line we pass to the kernel. If we
aren't queueing, and then add a working path, when we reload the device
we want to use the configured no_path_retry value, and not our current
queueing state.

One thing that would be nice would be tweaking the queue_if_no_path
value in the features string that we pass to the kernel based on the
current queueing state and the current path states. If we are reloading
a device, and we know that all the paths in that device are down, and we
currently aren't queueing, we shouldn't pass in a features string that
tells the kernel to enable queueing, which is what we currently do. The
patch I posted which started this conversation exists because we don't
do this, and we need to fix the queueing state afterwards.

> Along a similar line of reasoning, let me emphasize again why I would
> like to have the logic for enabling or disabling queueing in one place.
> Your previous post nicely distinguished the various cases and call
> chains which change the queueing and recovery mode, but it would be
> better if that kind of assessment wasn't necessary, and instead it was
> possible to inspect the code in just a block of (say) 50 code lines
> that would decide whether or not a given map should be queueing and /
> or in recovery mode.
> 
> It's an unfortunate property of the multipath-tools code base that
> important logic is dispersed over multiple functions and source files,
> making a correct assessment of the code flow difficult and cumbersome.
> It can't be completely avoided given the complexity of the subject, but
> it would be nice to try.
> 
> I'm looking forward to your new patch set ;-)

Unfortunately, my new patchset doesn't change the first patch. It just
builds on it to clean how the interactive client commands call things, to
reduce the callers of some of our functions, and keep the client
commands from doing more work than they should be. After the changes,
the only things calling set_no_path_retry() are functions that are
updating the multipath device due to external events, and the check_path
code.  

The update_queue_mode{add,del}_path() functions should be (and based on
my code examination, are) able to assume a device in the correct queuing
and recovery state, and just do the work of entering or leaving the
recovery mode, based on a change in the number of active paths. 

If we move the code that checks if the our queuing state is wrong out of
set_no_path_retry() into the update_queue_mode{add,del}_path(), then we
add a requirement that callers of those functions refresh the current
queueing state. Otherwise we can't actually guarantee that they do the
right thing. With the current callers, this doesn't really matter,
because we don't need them to fix the queueing state. But why not keep
the code that fixes the queueing state in a function that's called when
it could be wrong, and also is (after my new patchset) only called after
we have refreshed the state so it will do the right thing.

So, I would prefer if we go with my first patch, as is.  I'm open to the
idea of moving this code into update_queue_mode{add,del}_path() as part
of a future patchset where we switch to tracking the current queueing
state outside of mpp->features, and updating it whenever we actually
enable or disable queueing. I can work on this if you want.

-Ben 

> Thanks,
> Martin
> 
> 
> 
>