Re: [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2024-04-30 at 17:29 -0400, Benjamin Marzinski wrote:
> On Tue, Apr 30, 2024 at 07:06:24PM +0200, Martin Wilck wrote:
> > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > > 
> > > 1. create a multipath device with a kpartx partition on top of it
> > > and
> > > no_path_retry set to either "queue" or something long enough to
> > > run
> > > all
> > > the commands in the reproducer before it disables queueing.
> > > 2. disable all the paths to the device with something like:
> > >  # echo offline > /sys/block/<path_dev>/device/state
> > > 3. Write directly to the multipath device with something like:
> > >  # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1
> > > 4. delete all the paths to the device with something like:
> > >  # echo 1 > /sys/block/<path_dev>/device/delete
> > 
> > I've tried to reproduce the issue with these commands. Test system
> > was
> > using a LIO iSCSI target with 2 paths. I created a  test script
> > (attached) to try the offline / IO / delete procedure repeatedly.
> > I haven't been able to make multipathd hang even once.
> > 
> > I also played around with dd options. If I use oflag=sync or
> > oflag=direct, the dd command itself hangs.
> > 
> > Did I set up anything wrongly, or does the behavior perhaps depend
> > on
> > the kernel, or something else perhaps? Mine was a 6.4 kernel. This
> > is
> > not to say there's something wrong with your patch, but I'd like to
> > understand the error situation better, as it doesn't seem to be
> > trigger-able on my test system.
> > 
> > multipath.conf:
> > 
> > defaults {
> > 	verbosity 3
> > 	flush_on_last_del yes
> 
> If you set flush_on_last_del to "yes", then you won't be able to hit
> this, because you will never be queueing when multipathd tries to
> autoremove the device. The goal of my patch was to make sure
> multipathd
> never hung on an autoremove, regardless of the no_path_retry setting
> and
> the flush_on_last_del setting

Stupid me. To my excuse, I'd set "flush_on_last_del yes" because I
previously had been unable to reproduce the multipathd hang with the
default setting "flush_on_last_del no", and thought I'd misunderstood
something about flush_on_last_del. But I'd made some other mistake  at
that point, apparently, which caused the issue not to reproduce.

I just set "flush_on_last_del no" and indeed reproduced the issue with
my script, immediately.

Thanks,
Martin






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux