Re: [PATCH 2/7] multipathd: fix check_path errors with removed map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> If a multipath device is removed during, or immediately before the
> call
> to check_path(), multipathd can behave incorrectly. A missing
> multpath
> device will cause update_multipath_strings() to fail, setting
> pp->dmstate to PSTATE_UNDEF.  If the path is up, this state will
> cause
> reinstate_path() to be called, which will also fail.  This will
> trigger
> a reload, restoring the recently removed device.
> 
> If update_multipath_strings() fails because there is no multipath
> device, check_path should just quit, since the remove dmevent and
> uevent
> are likely already queued up. Also, I don't see any reason to reload
> the
> multipath device if reinstate fails. This code was added by
> fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that
> reinstate
> could fail if the path was disabled.  Looking through the current
> kernel
> code, I can't see any reason why a reinstate would fail, where a
> reload
> would help. If the path was missing from the multipath device,
> update_multipath_strings() would already catch that, and quit
> check_path() early, which make more sense to me than reloading does.

fac68d7 is related to the famous "dm-multipath: Accept failed paths for
multipath maps" patch (e.g. 
https://patchwork.kernel.org/patch/3368381/#7193001), which never made
it upstream. SUSE kernels have shipped this patch for a long time, but
we don't apply it any more in recent kernels.

With this patch, The reinstate_path() failure would occur if multipathd
had created a table with a "disabled" device (one which would be
present in a dm map even though the actual block device didn't exist),
and then tried to reinstate such a path. It sounds unlikely, but it
might be possible if devices are coming and going in quick succession.
In that situation, and with the "accept failed path" patch applied, a
reload makes some sense, because reload (unlike reinstate) would not
fail (at least not for this reason) and would actually add that just-
reinstated path. OTOH, it's not likely that the reload would improve
matters, either. After all, multipathd is just trying to reinstate a
non-existing path. So, I'm fine with skipping the reload.

> 
> Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> ---
>  multipathd/main.c | 36 ++++++++++++++----------------------
>  1 file changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 6b7db2c0..8fb73922 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1611,22 +1611,18 @@ fail_path (struct path * pp, int del_active)
>  /*
>   * caller must have locked the path list before calling that
> function
>   */
> -static int
> +static void
>  reinstate_path (struct path * pp)
>  {
> -	int ret = 0;
> -
>  	if (!pp->mpp)
> -		return 0;
> +		return;
>  
> -	if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) {
> +	if (dm_reinstate_path(pp->mpp->alias, pp->dev_t))
>  		condlog(0, "%s: reinstate failed", pp->dev_t);
> -		ret = 1;
> -	} else {
> +	else {
>  		condlog(2, "%s: reinstated", pp->dev_t);
>  		update_queue_mode_add_path(pp->mpp);
>  	}
> -	return ret;
>  }
>  static void
> @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
>  	 * Synchronize with kernel state
>  	 */
>  	if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) {
> +		struct dm_info info;
>  		condlog(1, "%s: Could not synchronize with kernel
> state",
>  			pp->dev);
> +		if (pp->mpp && pp->mpp->alias &&
> +		    do_dm_get_info(pp->mpp->alias, &info) == 0)
> +			/* multipath device missing. Likely removed */
> +			return 0;
>  		pp->dmstate = PSTATE_UNDEF;

It would be more elegant if we could distinguish different failure
modes from update_multipath_strings() directly, without having to call
do_dm_get_info() again.

>  	}
>  	/* if update_multipath_strings orphaned the path, quit early */
> @@ -2179,12 +2180,8 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
>  		/*
>  		 * reinstate this path
>  		 */
> -		if (!disable_reinstate && reinstate_path(pp)) {
> -			condlog(3, "%s: reload map", pp->dev);
> -			ev_add_path(pp, vecs, 1);
> -			pp->tick = 1;
> -			return 0;
> -		}
> +		if (!disable_reinstate)
> +			reinstate_path(pp);
>  		new_path_up = 1;
>  
>  		if (oldchkrstate != PATH_UP && oldchkrstate !=
> PATH_GHOST)
> @@ -2200,15 +2197,10 @@ check_path (struct vectors * vecs, struct
> path * pp, unsigned int ticks)
>  	else if (newstate == PATH_UP || newstate == PATH_GHOST) {
>  		if ((pp->dmstate == PSTATE_FAILED ||
>  		    pp->dmstate == PSTATE_UNDEF) &&
> -		    !disable_reinstate) {
> +		    !disable_reinstate)
>  			/* Clear IO errors */
> -			if (reinstate_path(pp)) {
> -				condlog(3, "%s: reload map", pp->dev);
> -				ev_add_path(pp, vecs, 1);
> -				pp->tick = 1;
> -				return 0;
> -			}
> -		} else {
> +			reinstate_path(pp);
> +		else {
>  			LOG_MSG(4, verbosity, pp);
>  			if (pp->checkint != max_checkint) {
>  				/*

-- 
Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer



--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux