Re: [PATCH 2/7] multipathd: fix check_path errors with removed map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 18, 2020 at 07:34:38PM +0000, Martin Wilck wrote:
> On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> > If a multipath device is removed during, or immediately before the
> > call
> > to check_path(), multipathd can behave incorrectly. A missing
> > multpath
> > device will cause update_multipath_strings() to fail, setting
> > pp->dmstate to PSTATE_UNDEF.  If the path is up, this state will
> > cause
> > reinstate_path() to be called, which will also fail.  This will
> > trigger
> > a reload, restoring the recently removed device.
> > 
> > If update_multipath_strings() fails because there is no multipath
> > device, check_path should just quit, since the remove dmevent and
> > uevent
> > are likely already queued up. Also, I don't see any reason to reload
> > the
> > multipath device if reinstate fails. This code was added by
> > fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that
> > reinstate
> > could fail if the path was disabled.  Looking through the current
> > kernel
> > code, I can't see any reason why a reinstate would fail, where a
> > reload
> > would help. If the path was missing from the multipath device,
> > update_multipath_strings() would already catch that, and quit
> > check_path() early, which make more sense to me than reloading does.
> 
> fac68d7 is related to the famous "dm-multipath: Accept failed paths for
> multipath maps" patch (e.g. 
> https://patchwork.kernel.org/patch/3368381/#7193001), which never made
> it upstream. SUSE kernels have shipped this patch for a long time, but
> we don't apply it any more in recent kernels.
> 
> With this patch, The reinstate_path() failure would occur if multipathd
> had created a table with a "disabled" device (one which would be
> present in a dm map even though the actual block device didn't exist),
> and then tried to reinstate such a path. It sounds unlikely, but it
> might be possible if devices are coming and going in quick succession.
> In that situation, and with the "accept failed path" patch applied, a
> reload makes some sense, because reload (unlike reinstate) would not
> fail (at least not for this reason) and would actually add that just-
> reinstated path. OTOH, it's not likely that the reload would improve
> matters, either. After all, multipathd is just trying to reinstate a
> non-existing path. So, I'm fine with skipping the reload.
> 
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> > ---
> >  multipathd/main.c | 36 ++++++++++++++----------------------
> >  1 file changed, 14 insertions(+), 22 deletions(-)
> > 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 6b7db2c0..8fb73922 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -1611,22 +1611,18 @@ fail_path (struct path * pp, int del_active)
> >  /*
> >   * caller must have locked the path list before calling that
> > function
> >   */
> > -static int
> > +static void
> >  reinstate_path (struct path * pp)
> >  {
> > -	int ret = 0;
> > -
> >  	if (!pp->mpp)
> > -		return 0;
> > +		return;
> >  
> > -	if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) {
> > +	if (dm_reinstate_path(pp->mpp->alias, pp->dev_t))
> >  		condlog(0, "%s: reinstate failed", pp->dev_t);
> > -		ret = 1;
> > -	} else {
> > +	else {
> >  		condlog(2, "%s: reinstated", pp->dev_t);
> >  		update_queue_mode_add_path(pp->mpp);
> >  	}
> > -	return ret;
> >  }
> >  static void
> > @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >  	 * Synchronize with kernel state
> >  	 */
> >  	if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) {
> > +		struct dm_info info;
> >  		condlog(1, "%s: Could not synchronize with kernel
> > state",
> >  			pp->dev);
> > +		if (pp->mpp && pp->mpp->alias &&
> > +		    do_dm_get_info(pp->mpp->alias, &info) == 0)
> > +			/* multipath device missing. Likely removed */
> > +			return 0;
> >  		pp->dmstate = PSTATE_UNDEF;
> 
> It would be more elegant if we could distinguish different failure
> modes from update_multipath_strings() directly, without having to call
> do_dm_get_info() again.

Seems reasonable. I'll take a look at that.

-Ben
 
> >  	}
> >  	/* if update_multipath_strings orphaned the path, quit early */
> > @@ -2179,12 +2180,8 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >  		/*
> >  		 * reinstate this path
> >  		 */
> > -		if (!disable_reinstate && reinstate_path(pp)) {
> > -			condlog(3, "%s: reload map", pp->dev);
> > -			ev_add_path(pp, vecs, 1);
> > -			pp->tick = 1;
> > -			return 0;
> > -		}
> > +		if (!disable_reinstate)
> > +			reinstate_path(pp);
> >  		new_path_up = 1;
> >  
> >  		if (oldchkrstate != PATH_UP && oldchkrstate !=
> > PATH_GHOST)
> > @@ -2200,15 +2197,10 @@ check_path (struct vectors * vecs, struct
> > path * pp, unsigned int ticks)
> >  	else if (newstate == PATH_UP || newstate == PATH_GHOST) {
> >  		if ((pp->dmstate == PSTATE_FAILED ||
> >  		    pp->dmstate == PSTATE_UNDEF) &&
> > -		    !disable_reinstate) {
> > +		    !disable_reinstate)
> >  			/* Clear IO errors */
> > -			if (reinstate_path(pp)) {
> > -				condlog(3, "%s: reload map", pp->dev);
> > -				ev_add_path(pp, vecs, 1);
> > -				pp->tick = 1;
> > -				return 0;
> > -			}
> > -		} else {
> > +			reinstate_path(pp);
> > +		else {
> >  			LOG_MSG(4, verbosity, pp);
> >  			if (pp->checkint != max_checkint) {
> >  				/*
> 
> -- 
> Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux