On Tue, 2024-04-30 at 17:29 -0400, Benjamin Marzinski wrote: > On Tue, Apr 30, 2024 at 07:06:24PM +0200, Martin Wilck wrote: > > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote: > > > > > > 1. create a multipath device with a kpartx partition on top of it > > > and > > > no_path_retry set to either "queue" or something long enough to > > > run > > > all > > > the commands in the reproducer before it disables queueing. > > > 2. disable all the paths to the device with something like: > > > # echo offline > /sys/block/<path_dev>/device/state > > > 3. Write directly to the multipath device with something like: > > > # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1 > > > 4. delete all the paths to the device with something like: > > > # echo 1 > /sys/block/<path_dev>/device/delete > > > > I've tried to reproduce the issue with these commands. Test system > > was > > using a LIO iSCSI target with 2 paths. I created a test script > > (attached) to try the offline / IO / delete procedure repeatedly. > > I haven't been able to make multipathd hang even once. > > > > I also played around with dd options. If I use oflag=sync or > > oflag=direct, the dd command itself hangs. > > > > Did I set up anything wrongly, or does the behavior perhaps depend > > on > > the kernel, or something else perhaps? Mine was a 6.4 kernel. This > > is > > not to say there's something wrong with your patch, but I'd like to > > understand the error situation better, as it doesn't seem to be > > trigger-able on my test system. > > > > multipath.conf: > > > > defaults { > > verbosity 3 > > flush_on_last_del yes > > If you set flush_on_last_del to "yes", then you won't be able to hit > this, because you will never be queueing when multipathd tries to > autoremove the device. The goal of my patch was to make sure > multipathd > never hung on an autoremove, regardless of the no_path_retry setting > and > the flush_on_last_del setting Stupid me. To my excuse, I'd set "flush_on_last_del yes" because I previously had been unable to reproduce the multipathd hang with the default setting "flush_on_last_del no", and thought I'd misunderstood something about flush_on_last_del. But I'd made some other mistake at that point, apparently, which caused the issue not to reproduce. I just set "flush_on_last_del no" and indeed reproduced the issue with my script, immediately. Thanks, Martin