Re: [PATCH 2/3] libmultipath: change flush_on_last_del to fix a multipathd hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2024-04-24 at 16:42 -0400, Benjamin Marzinski wrote:
> On Wed, Apr 24, 2024 at 06:34:35PM +0200, Martin Wilck wrote:
> > On Tue, 2024-04-23 at 18:25 -0400, Benjamin Marzinski wrote:
> > 
> > > 
> > > I chose "unused" as the default because this should generally
> > > only
> > > cause
> > > multipathd to work differently from the users perspective when
> > > nothing
> > > has the multipath device open but it is queueing and there is
> > > outstanding IO. Without this change, it would have hung instead
> > > of
> > > failing the outstanding IO. However, I do understand that an
> > > argument
> > > could be made that "never" makes more sense as default, even
> > > though
> > > it
> > > will cause multipathd to skip autoremoves in cases where it
> > > wouldn't
> > > before.
> > 
> > How important is the autoremove feature, actually? I have do
> > confess I
> > don't like it too much, anyway.
> > 
> > I'd actually vote to set the default to "never", even if that means
> > a
> > change in the default behavior, unless someone convinces me that
> > autoremoving unused maps with no paths is actually the right thing
> > to
> > do.
> 
> If we changed the default behavior, we'd find out how much users rely
> on
> it. My guess is "more than they should".  Ideally, users would shut
> down
> their multipath device before removing their underlying path devices.
> In practice, I bet that a significant number do not. 

I'm not sure if we should continue encouraging this sort of lazy
practice. Actually, it doesn't matter that much if the multipath device
persists. If there's anything of value, like a file system, database
storage, or whatever, in the stack on top, the user would be well-
advised to close/umount it before deleting the path devices. If she
does, also destructing the multipath device would be minor step. If she
doesn't, the fact that the multipath device persists would be a lesser
concern.

But of course you're right, we don't know what our users actually do in
this respect.

> Since deleting path
> devices isn't a common thing, and we can argue that we are just
> enforcing good behavior, it's probably o.k. to change it, but I'm
> pretty
> sure some users would complain or file bugs. Off the top of my head,
> one
> thing that could potentially go wrong (although I hope this isn't
> likely
> in the real world) is:
> 
> 1. User removes the LUNs under an unused multipath device. The
>    multipath device isn't removed
> 2. User creates new LUNs with a different size. The array gives these
>    new LUNs the same WWID as the old ones (I don't know how often
> this
>    happens. I hope rarely, if at all)
> 3. Multipathd tries to add these new LUNs to the existing multipath
>    device and fails because the size doesn't match.

My first thought here was: we could avoid this scenario by overriding
the mp's size with the path's size if the existing map had no devices.
But if there was actually pending, queued I/O on the device, we might
corrupt the new device as soon as it's added. That wouldn't be a good
idea.

But then, the situation at hand would be rather obvious, the customer
would need to remove the existing, empty map before creating the new
one. If this fails because the map is either in use, has outstanding
I/O, or both, it seems reasonable that the user would need to inspect
the situation and fix it manually. I don't think this can be automated
on our side.

> Also, our current dev_loss_tmo resetting seems pretty racy. It's set
> to
> exactly the same length of time as the we should retry for. If
> multipathd is slow in its retries, paths can get deleted while we are
> still queueing, making it more likely that you would have pathless
> multipath devices hanging around. In theory, we could add another
> check
> to see if a device has no paths and isn't in use when we disable
> queueing, but I don't see a reason why we should expand autoremoving,
> and we can easily pad the dev_loss_tmo value to reduce the race.

Good point.

> 
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> > > ---
> > >  libmultipath/defaults.h       |  2 +-
> > >  libmultipath/dict.c           | 72
> > > +++++++++++++++++++++++++++++++--
> > > --
> > >  libmultipath/dict.h           |  1 +
> > >  libmultipath/hwtable.c        |  6 +--
> > >  libmultipath/propsel.c        |  4 +-
> > >  libmultipath/structs.h        |  7 ++--
> > >  multipath/multipath.conf.5.in | 20 +++++++---
> > >  multipathd/main.c             | 42 +++++++++++++++++---
> > >  8 files changed, 127 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
> > > index 64b633f2..ed08c251 100644
> > 

[...]

> > How much sense do these hwtable entries make? In view of your
> > commit
> > description, this seems to be a dangerous device default. It has
> > the
> > surprising side effect that users who don't want flush_on_last_del
> > will
> > need to set in either in the overrides section or in a dedicated
> > device
> > section for the device at hand. This can be changed in a separate
> > patch
> > of course.
> 
> I remember NetApp having strong feelings about their configs. They
> are
> the only user of the devices section user_friendly_names option. IIRC
> the only reason this option exists in the devices section is because
> they pushed for it.
> 
> But the way they have things set up, I don't see it as dangerous.
> They
> set dev_loss_tmo to "infinity" so their path devices should never
> disappear, unless someone chooses to remove them. They also set
> no_path_retry to "queue", since they don't want IO to ever fail on
> their
> devices unless someone manually disables queueing.

Yeah, I see it in 8e22682 ("multipath: Some device configuration
changes for NetApp LUNs", 12y ago.
I agree that in combination with "no_path_retry queue" and
"dev_loss_tmo infinity", this setting makes sense, or is at least not
harmful. Still, we could ask NetApp for their motivation.


>  It's odd, but nobody
> has ever complained that the builtin config for NetApp devices
> doesn't
> work like the other builtin configs, so it's just remained that way.
> Again, it seems like it's more trouble than it's worth to change it
> now.
> 
> On the other hand, the Infinidat config already sets no_path_retry to
> "fail", so there's not much point in also setting flush_on_last_del
> to
> "always".

I'd say it's dangerous, especially as they don't set dev_loss_tmo.

[...]
 
> > > +
> > > +	/* It's not safe to do a non-deferred remove of a map
> > > that
> > > has
> > > +	 * "queue_if_no_path" set, since there could be
> > > outstanding
> > > IO which
> > > +	 * will cause multipathd to hang while attempting the
> > > remove
> > > */
> > > +	if (mpp->flush_on_last_del == FLUSH_NEVER &&
> > > +	    !do_deferred(deferred_remove) && is_queueing) {
> > > +		condlog(2, "%s: map is queueing, can't remove",
> > > mpp-
> > > > alias);
> > > +		return false;
> > > +	}
> > 
> > Just a thought: Why don't we just pretend that deferred_remove is
> > on if
> > flush_on_last_del is set to "never"? Wouldn't that avoid our
> > issues?
> > In the same context, why haven't we made deferred_remove the
> > default
> > yet?
> 
> I'm not sure we want to change the default multipath behavior without
> a
> good reason.  Users don't expect that removing a multipath device can
> succeed, but the device will still be there.

Right. We *could* distinguish auto-removal and manual removal and fall
back to deferred remove only in the auto-removal case. But I guess that
over-complicates matters.

Btw, I realize that in devmapper.h we pass 0 for the deferred parameter
of _dm_flush_map, while we should rather pass DEFERRED_REMOVE_OFF. It
has no actual consequences, but maybe you want to fix it while you're
working on this?

#define dm_flush_map(mapname) _dm_flush_map(mapname, 1, 0, 0, 0)
#define dm_flush_map_nosync(mapname) _dm_flush_map(mapname, 0, 0, 0, 0)
#define dm_suspend_and_flush_map(mapname, retries) \
	_dm_flush_map(mapname, 1, 0, 1, retries)


> Worse, I just tested and found out that my test will still hang
> multipathd with my new code if you have deferred_remove turned on. 
> This
> is because there is no opener of the kpartx device, so device-mapper
> tries to immediately remove it. Apparently, it's also not safe to
> trigger a deferred remove without disabling queueing.
> 

Ugh. I guess you'll fix this in a v2, right?


> > > +	if (mpp->flush_on_last_del == FLUSH_UNUSED &&
> > > +	    !do_deferred(deferred_remove) &&
> > > +            (in_use = partmap_in_use(mpp->alias, NULL)) &&
> > > is_queueing) {
> > > +		condlog(2, "%s: map in use and queueing, can't
> > > remove",
> > > +			mpp->alias);
> > > +		return false;
> > > +	}
> > 
> > Same reasoning as above, why not just defer?
> > 
> > >  	/*
> > > -	 * flush_map will fail if the device is open
> > > +	 * This will flush FLUSH_NEVER devices and FLUSH_UNUSED
> > > devices
> > > +	 * that are in use, but only if this is not a deferred
> > > remove, and
> > 
> > "that are not in use" ?
> 
> This *will* disable queue_if_no_path on in-use devices. But like I
> said,
> only if, according to mulitpathd, the device is already set to
> queue_if_no_path.  This is just to make sure the mpp->features isn't
> out
> of date, since getting this wrong and hanging multipathd is a bad
> thing.

I understand that you're calling dm_queue_if_no_path(mpp, 0) to be on
the safe side. That makes sense. But the comment is confusing. AFAICS,
you're doing this regardless of the value of is_queueing. Only the log
priority of is_queueing depends on is_queueing.

This logic is complicated (with 24 possible combinations of
flush_on_last_del, deferred, is_queueing, and in_use). It took me a
while to figure out, but meanwhile I think you got it right, just the
comment keeps confusing me.

> Alternatively, I could call update_multipath_table() to make sure
> mpp->features is up to date. This way does less work, and if
> multipathd
> thinks that the device shouldn't be queueing, then it really
> shouldn't
> be queueing.

I didn't mean to say that you should do this.

Regards,
Martin






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux