Re: [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 30, 2024 at 07:06:24PM +0200, Martin Wilck wrote:
> On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > 
> > 1. create a multipath device with a kpartx partition on top of it and
> > no_path_retry set to either "queue" or something long enough to run
> > all
> > the commands in the reproducer before it disables queueing.
> > 2. disable all the paths to the device with something like:
> >  # echo offline > /sys/block/<path_dev>/device/state
> > 3. Write directly to the multipath device with something like:
> >  # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1
> > 4. delete all the paths to the device with something like:
> >  # echo 1 > /sys/block/<path_dev>/device/delete
> 
> I've tried to reproduce the issue with these commands. Test system was
> using a LIO iSCSI target with 2 paths. I created a  test script
> (attached) to try the offline / IO / delete procedure repeatedly.
> I haven't been able to make multipathd hang even once.
> 
> I also played around with dd options. If I use oflag=sync or
> oflag=direct, the dd command itself hangs.
> 
> Did I set up anything wrongly, or does the behavior perhaps depend on
> the kernel, or something else perhaps? Mine was a 6.4 kernel. This is
> not to say there's something wrong with your patch, but I'd like to
> understand the error situation better, as it doesn't seem to be
> trigger-able on my test system.
> 
> multipath.conf:
> 
> defaults {
> 	verbosity 3
> 	flush_on_last_del yes

If you set flush_on_last_del to "yes", then you won't be able to hit
this, because you will never be queueing when multipathd tries to
autoremove the device. The goal of my patch was to make sure multipathd
never hung on an autoremove, regardless of the no_path_retry setting and
the flush_on_last_del setting.

With "always", the device will always have queueing disabled, so the
device can be safely removed.

With "unused", if the device is unused, queuing is disabled. Otherwise,
multipathd will skip the autoremove if the device is queueing.

With "never", multipathd will skip the autoremove if the device is
queueing.

Your script looks fine, but with a system set up to hit it, the bug
should occur every time.

-Ben

> }
> 
> blacklist {
> 	wwid QEMU
> }
> 
> overrides {
> 	no_path_retry queue
> }
> 
> Regards,
> Martin
> 
> 
> 






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux