Re: [PATCH 0/7] Fix muitpath/multipathd flush issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 01, 2020 at 10:54:34PM +0200, Martin Wilck wrote:
> On Thu, 2020-06-18 at 18:06 -0500, Benjamin Marzinski wrote:
> > 
> > I uploaded the test program, aio_test:
> > 
> > https://github.com/bmarzins/test_programs.git
> > 
> > You just need to run in on a queueing multipath device with no active
> > paths and an open count of 0. It will hang with the device open.
> > Restore
> > a path, and it will exit, and the open count will go to 0.
> 
> Tried it now, it behaves as you say. I admit I can't imagine how the
> suspend/resume would improve anything here. Indeed, as you say, it opens 
> up a race window. Another process might open the device while
> it's suspended. Worse perhaps, once the device is resumed, an uevent will be 
> generated, and stacked devices might (in principle at least) be recreated
> before we get down to flush the map.
> 
> MAYBE the suspend/resume was necessary in the past because some earlier 
> kernels wouldn't immediately fail all outstanding commands when 
> queue_if_no_path was deactivated? (just speculating, I don't know if this
> is the case).

If you disable queue_if_no_path and then do a suspend with flushing, you
are guaranteed that the supend won't return until all the IO has
completed or failed.  This would allow anything that was waiting on
queued IO to have the IO failback, which could allow it to close the
device in time for multipath to be able to remove it (obviously this is
racey).  However, this assumes that you do your open checks after the
suspend, which multipath no longer does. I realize that multipath can't
suspend until after it tries to remove the partition devices, otherwise
those can get stuck. But there probably is some order that gets this all
right-ish.

So, for a while now, the suspending has been doing nothing for us.  We
could either try to reorder things so that we actually try to flush the
queued IOs back first (with or without suspending), or we could just
remove suspending and say that things are working fine the way they
currently are.

-Ben
 
> Regards,
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 
> 

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux