Re: [PATCH] multipathd: Make sure to disable queueing if recovery has failed.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2023-11-27 at 16:54 -0500, Benjamin Marzinski wrote:
> If a multipath device has no_path_retry set to a number and has lost
> all
> paths, gone into recovery mode, and timed out, it will disable
> queue_if_no_paths. After that, if one of those failed paths is
> removed,
> when the device is reloaded, queue_if_no_paths will be re-enabled. 
> When
> set_no_path_retry() is then called to update the queueing state, it
> will
> not disable queue_if_no_paths, since the device is still in the
> recovery
> state, so it believes no work needs to be done. The device will
> remain
> in the recovery state, with retry_ticks at 0, and queueing enabled,
> even though there are no usable paths.
> 
> To fix this, in set_no_path_retry(), if no_path_retry is set to a
> number
> and the device is queueing but it is in recovery mode and out of
> retries with no usable paths, manually disable queue_if_no_path.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>

Thanks, this makes sense. I am just wondering if this logic should
be moved to enter_recovery_mode() / leave_recovery_mode() instead.
That pair of functions is also called from
update_queue_mode{add,del}_path(), where the same reasoning would
apply, afaics. Am I overlooking something?

Regards
Martin

> ---
>  libmultipath/structs_vec.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index 0e8a46e7..3cb23c73 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -627,8 +627,18 @@ void set_no_path_retry(struct multipath *mpp)
>  			    !mpp->in_recovery)
>  				dm_queue_if_no_path(mpp->alias, 1);
>  			leave_recovery_mode(mpp);
> -		} else if (pathcount(mpp, PATH_PENDING) == 0)
> +		} else if (pathcount(mpp, PATH_PENDING) == 0) {
> +			/*
> +			 * If in_recovery is set,
> enter_recovery_mode does
> +			 * nothing. If the device is already in
> recovery
> +			 * mode and has already timed out, manually
> call
> +			 * dm_queue_if_no_path to stop it from
> queueing.
> +			 */
> +			if ((!mpp->features || is_queueing) &&
> +			    mpp->in_recovery && mpp->retry_tick ==
> 0)
> +				dm_queue_if_no_path(mpp->alias, 0);
>  			enter_recovery_mode(mpp);
> +		}
>  		break;
>  	}
>  }






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux