Re: [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> Commit 9bd3482e ("multipathd: make flush_map() delete maps like the
> multipath command") fixed an issue where deleting a queueing
> multipath
> device through multipathd could hang because the multipath device had
> outstanding IO, even though the only openers of it at the time of
> deletion were the kpartx partition devices. However it is still
> possible
> to hang multipathd, because autoremoving the device when all paths
> have
> been deleted doesn't disable queueing. To reproduce this hang:
> 
> 1. create a multipath device with a kpartx partition on top of it and
> no_path_retry set to either "queue" or something long enough to run
> all
> the commands in the reproducer before it disables queueing.
> 2. disable all the paths to the device with something like:
>  # echo offline > /sys/block/<path_dev>/device/state
> 3. Write directly to the multipath device with something like:
>  # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1
> 4. delete all the paths to the device with something like:
>  # echo 1 > /sys/block/<path_dev>/device/delete
> 
> Multipathd will hang trying to delete the kpartx device because, as
> the
> last opener, it must flush the multipath device before closing it.
> Because it hangs holding the vecs_lock, multipathd will never disable
> queueing on the device, so it will hang forever, even if
> no_path_retry
> is set to a number.
> 
> This hang can occur, even if deferred_remove is set. Since nothing
> has
> the kpartx device opened, device-mapper does an immediate remove,
> which
> will still hang. This means that even if deferred_remove is set,
> multipathd still cannot delete a map while queueing is enabled. It
> must
> either disable queueing or skip the autoremove.
> 
> Mulitpath can currently be configured to avoid this hang by setting
> 
> flush_on_last_del yes
> 
> However there are good reasons why users wouldn't want to set that.
> They
> may need to be able to survive having all paths getting temporarily
> deleted.  I should note that this is a pretty rare corner case, since
> multipath automatically sets dev_loss_tmo so that it should not
> trigger
> before queueing is disabled.
> 
> This commit avoids the hang by changing the possible values for
> flush_on_last_del to "never", "unused", and "always", and sets the
> default to "unused".  "always" works like "yes" did, "never" will not
> disable queueing, and "unused" will only disable queueing if nothing
> has
> the kpartx devices or the multipath device open. In order to be safe,
> if
> the device has queue_if_no_paths set (and in case of "unused", the
> device is in-use) the autoremove will be skipped. Also, instead of
> just
> trusting the lack of "queue_if_no_paths" in the current mpp-
> >features,
> multipathd will tell the kernel to disable queueing, just to be sure
> it
> actually is.
> 
> I chose "unused" as the default because this should generally only
> cause
> multipathd to work differently from the users perspective when
> nothing
> has the multipath device open but it is queueing and there is
> outstanding IO. Without this change, it would have hung instead of
> failing the outstanding IO. However, I do understand that an argument
> could be made that "never" makes more sense as default, even though
> it
> will cause multipathd to skip autoremoves in cases where it wouldn't
> before. The change to the behavior of deffered_remove will be
> noticeable, but skipping an autoremove is much better than hanging.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> ---
>  libmultipath/defaults.h       |  2 +-
>  libmultipath/dict.c           | 72 +++++++++++++++++++++++++++++++--
> --
>  libmultipath/dict.h           |  1 +
>  libmultipath/hwtable.c        |  6 +--
>  libmultipath/propsel.c        |  4 +-
>  libmultipath/structs.h        |  7 ++--
>  multipath/multipath.conf.5.in | 20 +++++++---
>  multipathd/main.c             | 39 +++++++++++++++----
>  8 files changed, 122 insertions(+), 29 deletions(-)
> 
> diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
> index 64b633f2..ed08c251 100644

The spell checker noted a typo ("parition") which I'm going to amend.

Reviewed-by: Martin Wilck <mwilck@xxxxxxxx>







[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux