Re: [PATCH v2 1/5] multipathd: Make sure to disable queueing if recovery has failed.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2023-12-05 at 18:57 -0500, Benjamin Marzinski wrote:
> If a multipath device has no_path_retry set to a number and has lost
> all
> paths, gone into recovery mode, and timed out, it will disable
> queue_if_no_paths. After that, if one of those failed paths is
> removed,
> when the device is reloaded, queue_if_no_paths will be re-enabled. 
> When
> set_no_path_retry() is then called to update the queueing state, it
> will
> not disable queue_if_no_paths, since the device is still in the
> recovery
> state, so it believes no work needs to be done. The device will
> remain
> in the recovery state, with retry_ticks at 0, and queueing enabled,
> even though there are no usable paths.
> 
> To fix this, in set_no_path_retry(), if no_path_retry is set to a
> number
> and the device is queueing but it is in recovery mode and out of
> retries with no usable paths, manually disable queue_if_no_path.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> ---
>  libmultipath/structs_vec.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index 0e8a46e7..9345b622 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -627,8 +627,18 @@ void set_no_path_retry(struct multipath *mpp)
>  			    !mpp->in_recovery)
>  				dm_queue_if_no_path(mpp->alias, 1);
>  			leave_recovery_mode(mpp);
> -		} else if (pathcount(mpp, PATH_PENDING) == 0)
> +		} else if (pathcount(mpp, PATH_PENDING) == 0) {
> +			/*
> +			 * If in_recovery is set,
> enter_recovery_mode does
> +			 * nothing. If the device is already in
> recovery
> +			 * mode and has already timed out, manually
> call
> +			 * dm_queue_if_no_path to stop it from
> queueing.
> +			 */
> +			if ((!mpp->features || is_queueing) &&
> +			    mpp->in_recovery && mpp->retry_tick ==
> 0)
> +				dm_queue_if_no_path(mpp->alias, 0);

I don't understand why you put this statement in the pathcount(mpp,
PATH_PENDING) == 0 clause. In 7278108 ("libmultipath: don't enter
recovery mode with pending paths"), you argued correctly that recovery
mode should not be *entered* with pending paths, but here we are in
recovery mode already.

Also, what do you think about the following? This way, we'd avoid
enabling queueing temporarily when we reload the map.

diff --git a/libmultipath/dmparser.c b/libmultipath/dmparser.c
index 3b37a92..3d85e6e 100644
--- a/libmultipath/dmparser.c
+++ b/libmultipath/dmparser.c
@@ -60,9 +60,19 @@ int assemble_map(struct multipath *mp, char **params)
        nr_priority_groups = VECTOR_SIZE(mp->pg);
        initial_pg_nr = (nr_priority_groups ? mp->bestpg : 0);
 
-       if (mp->no_path_retry != NO_PATH_RETRY_UNDEF  &&
-           mp->no_path_retry != NO_PATH_RETRY_FAIL) {
+       switch (mp->no_path_retry) {
+       case NO_PATH_RETRY_UNDEF:
+       case NO_PATH_RETRY_FAIL:
+               break;
+       default:
+               /* don't enable queueing if no_path_retry has timed out */
+               if (mp->in_recovery && mp->retry_tick == 0 &&
+                   count_active_paths(mp) == 0)
+                       break;
+               /* fallthrough */
+       case NO_PATH_RETRY_QUEUE:
                add_feature(&mp->features, no_path_retry);
+               break;
        }
        if (mp->retain_hwhandler == RETAIN_HWHANDLER_ON &&
            get_linux_version_code() < KERNEL_VERSION(4, 3, 0))


Regards
Martin






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux