Re: [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang

Martin Wilck <martin.wilck@xxxxxxxx> · Tue, 30 Apr 2024 19:06:24 +0200

On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> 
> 1. create a multipath device with a kpartx partition on top of it and
> no_path_retry set to either "queue" or something long enough to run
> all
> the commands in the reproducer before it disables queueing.
> 2. disable all the paths to the device with something like:
>  # echo offline > /sys/block/<path_dev>/device/state
> 3. Write directly to the multipath device with something like:
>  # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1
> 4. delete all the paths to the device with something like:
>  # echo 1 > /sys/block/<path_dev>/device/delete

I've tried to reproduce the issue with these commands. Test system was
using a LIO iSCSI target with 2 paths. I created a  test script
(attached) to try the offline / IO / delete procedure repeatedly.
I haven't been able to make multipathd hang even once.

I also played around with dd options. If I use oflag=sync or
oflag=direct, the dd command itself hangs.

Did I set up anything wrongly, or does the behavior perhaps depend on
the kernel, or something else perhaps? Mine was a 6.4 kernel. This is
not to say there's something wrong with your patch, but I'd like to
understand the error situation better, as it doesn't seem to be
trigger-able on my test system.

multipath.conf:

defaults {
	verbosity 3
	flush_on_last_del yes
}

blacklist {
	wwid QEMU
}

overrides {
	no_path_retry queue
}

Regards,
Martin

Attachment:
flush-0-paths.sh

Description: application/shellscript