Re: [PATCH v2 1/5] multipathd: Make sure to disable queueing if recovery has failed.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 14, 2023 at 04:11:32PM +0100, Martin Wilck wrote:
> On Tue, 2023-12-05 at 18:57 -0500, Benjamin Marzinski wrote:
> > If a multipath device has no_path_retry set to a number and has lost
> > all
> > paths, gone into recovery mode, and timed out, it will disable
> > queue_if_no_paths. After that, if one of those failed paths is
> > removed,
> > when the device is reloaded, queue_if_no_paths will be re-enabled. 
> > When
> > set_no_path_retry() is then called to update the queueing state, it
> > will
> > not disable queue_if_no_paths, since the device is still in the
> > recovery
> > state, so it believes no work needs to be done. The device will
> > remain
> > in the recovery state, with retry_ticks at 0, and queueing enabled,
> > even though there are no usable paths.
> > 
> > To fix this, in set_no_path_retry(), if no_path_retry is set to a
> > number
> > and the device is queueing but it is in recovery mode and out of
> > retries with no usable paths, manually disable queue_if_no_path.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> > ---
> >  libmultipath/structs_vec.c | 23 ++++++++++++++++++++++-
> >  1 file changed, 22 insertions(+), 1 deletion(-)
> > 
> > diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> > index 0e8a46e7..9345b622 100644
> > --- a/libmultipath/structs_vec.c
> > +++ b/libmultipath/structs_vec.c
> > @@ -627,8 +627,18 @@ void set_no_path_retry(struct multipath *mpp)
> >  			    !mpp->in_recovery)
> >  				dm_queue_if_no_path(mpp->alias, 1);
> >  			leave_recovery_mode(mpp);
> > -		} else if (pathcount(mpp, PATH_PENDING) == 0)
> > +		} else if (pathcount(mpp, PATH_PENDING) == 0) {
> > +			/*
> > +			 * If in_recovery is set,
> > enter_recovery_mode does
> > +			 * nothing. If the device is already in
> > recovery
> > +			 * mode and has already timed out, manually
> > call
> > +			 * dm_queue_if_no_path to stop it from
> > queueing.
> > +			 */
> > +			if ((!mpp->features || is_queueing) &&
> > +			    mpp->in_recovery && mpp->retry_tick ==
> > 0)
> > +				dm_queue_if_no_path(mpp->alias, 0);
> 
> I don't understand why you put this statement in the pathcount(mpp,
> PATH_PENDING) == 0 clause. In 7278108 ("libmultipath: don't enter
> recovery mode with pending paths"), you argued correctly that recovery
> mode should not be *entered* with pending paths, but here we are in
> recovery mode already.

I see your point. But I think the only way this could happen is if you
added a new path to a multipath device that was already in recovery
mode. If the path checker didn't finish quickly enough during path
addition, the path will be in PATH_PENDING after the reload. The fact
that you are adding the path makes it seem more likely that the path is
working. In this case, it seems a little wrong to have a multipath
device that starts queueing on the reload, and then disables queueing,
only to reenable it again a second later when the pending path is
rechecked and is found to be up.  I think it's reasonable to wait for
the PENDING state to get resolved before you take any actions here. But
I don't really have strong feelings about this, so if you do, I'm fine
with changing it.
 
> Also, what do you think about the following? This way, we'd avoid
> enabling queueing temporarily when we reload the map.
> 
> diff --git a/libmultipath/dmparser.c b/libmultipath/dmparser.c
> index 3b37a92..3d85e6e 100644
> --- a/libmultipath/dmparser.c
> +++ b/libmultipath/dmparser.c
> @@ -60,9 +60,19 @@ int assemble_map(struct multipath *mp, char **params)
>         nr_priority_groups = VECTOR_SIZE(mp->pg);
>         initial_pg_nr = (nr_priority_groups ? mp->bestpg : 0);
>  
> -       if (mp->no_path_retry != NO_PATH_RETRY_UNDEF  &&
> -           mp->no_path_retry != NO_PATH_RETRY_FAIL) {
> +       switch (mp->no_path_retry) {
> +       case NO_PATH_RETRY_UNDEF:
> +       case NO_PATH_RETRY_FAIL:
> +               break;
> +       default:
> +               /* don't enable queueing if no_path_retry has timed out */
> +               if (mp->in_recovery && mp->retry_tick == 0 &&
> +                   count_active_paths(mp) == 0)
> +                       break;
> +               /* fallthrough */
> +       case NO_PATH_RETRY_QUEUE:
>                 add_feature(&mp->features, no_path_retry);
> +               break;
>         }
>         if (mp->retain_hwhandler == RETAIN_HWHANDLER_ON &&
>             get_linux_version_code() < KERNEL_VERSION(4, 3, 0))

This looks good.

-Ben
> 
> 
> Regards
> Martin





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux