On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote: > Commit 9bd3482e ("multipathd: make flush_map() delete maps like the > multipath command") fixed an issue where deleting a queueing > multipath > device through multipathd could hang because the multipath device had > outstanding IO, even though the only openers of it at the time of > deletion were the kpartx partition devices. However it is still > possible > to hang multipathd, because autoremoving the device when all paths > have > been deleted doesn't disable queueing. To reproduce this hang: > > 1. create a multipath device with a kpartx partition on top of it and > no_path_retry set to either "queue" or something long enough to run > all > the commands in the reproducer before it disables queueing. > 2. disable all the paths to the device with something like: > # echo offline > /sys/block/<path_dev>/device/state > 3. Write directly to the multipath device with something like: > # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1 > 4. delete all the paths to the device with something like: > # echo 1 > /sys/block/<path_dev>/device/delete > > Multipathd will hang trying to delete the kpartx device because, as > the > last opener, it must flush the multipath device before closing it. > Because it hangs holding the vecs_lock, multipathd will never disable > queueing on the device, so it will hang forever, even if > no_path_retry > is set to a number. > > This hang can occur, even if deferred_remove is set. Since nothing > has > the kpartx device opened, device-mapper does an immediate remove, > which > will still hang. This means that even if deferred_remove is set, > multipathd still cannot delete a map while queueing is enabled. It > must > either disable queueing or skip the autoremove. > > Mulitpath can currently be configured to avoid this hang by setting > > flush_on_last_del yes > > However there are good reasons why users wouldn't want to set that. > They > may need to be able to survive having all paths getting temporarily > deleted. I should note that this is a pretty rare corner case, since > multipath automatically sets dev_loss_tmo so that it should not > trigger > before queueing is disabled. > > This commit avoids the hang by changing the possible values for > flush_on_last_del to "never", "unused", and "always", and sets the > default to "unused". "always" works like "yes" did, "never" will not > disable queueing, and "unused" will only disable queueing if nothing > has > the kpartx devices or the multipath device open. In order to be safe, > if > the device has queue_if_no_paths set (and in case of "unused", the > device is in-use) the autoremove will be skipped. Also, instead of > just > trusting the lack of "queue_if_no_paths" in the current mpp- > >features, > multipathd will tell the kernel to disable queueing, just to be sure > it > actually is. > > I chose "unused" as the default because this should generally only > cause > multipathd to work differently from the users perspective when > nothing > has the multipath device open but it is queueing and there is > outstanding IO. Without this change, it would have hung instead of > failing the outstanding IO. However, I do understand that an argument > could be made that "never" makes more sense as default, even though > it > will cause multipathd to skip autoremoves in cases where it wouldn't > before. The change to the behavior of deffered_remove will be > noticeable, but skipping an autoremove is much better than hanging. > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> > --- > libmultipath/defaults.h | 2 +- > libmultipath/dict.c | 72 +++++++++++++++++++++++++++++++-- > -- > libmultipath/dict.h | 1 + > libmultipath/hwtable.c | 6 +-- > libmultipath/propsel.c | 4 +- > libmultipath/structs.h | 7 ++-- > multipath/multipath.conf.5.in | 20 +++++++--- > multipathd/main.c | 39 +++++++++++++++---- > 8 files changed, 122 insertions(+), 29 deletions(-) > > diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h > index 64b633f2..ed08c251 100644 The spell checker noted a typo ("parition") which I'm going to amend. Reviewed-by: Martin Wilck <mwilck@xxxxxxxx>