On Wed, 3 Apr 2019 11:06:14 -0400 Farhan Ali <alifm@xxxxxxxxxxxxx> wrote: > On 04/03/2019 03:58 AM, Cornelia Huck wrote: > > On Tue, 2 Apr 2019 12:44:34 -0400 > > Farhan Ali <alifm@xxxxxxxxxxxxx> wrote: > > > >> The quiesce function calls cio_cancel_halt_clear() and if we > >> get an -EBUSY we go into a loop where we: > >> - wait for any interrupts > >> - flush all I/O in the workqueue > >> - retry cio_cancel_halt_clear > >> > >> During the period where we are waiting for interrupts or > >> flushing all I/O, the channel subsystem could have completed > >> a halt/clear action and turned off the corresponding activity > >> control bits in the subchannel status word. This means the next > >> time we call cio_cancel_halt_clear(), we will again start by > >> calling cancel subchannel and so we can be stuck between calling > >> cancel and halt forever. > >> > >> Rather than calling cio_cancel_halt_clear() immediately after > >> waiting, let's try to disable the subchannel. If we succeed in > >> disabling the subchannel then we know nothing else can happen > >> with the device. > > > > Hmm... I had to spend some time looking at the code and it seems to be > > a bit confused (not your patch, the general logic). > > > > Basically, we call the quiesce function for two reasons: > > - The device is shutdown/removed. We don't want to do anything with the > > device afterwards, and especially want nobody to start anything new. > > The subchannel will be disabled, and stay like that. > > - The mdev reset callback is invoked. When resetting, we basically > > disable the subchannel (any I/O of course needs to be done prior to > > that), and then enable it again. > > > > In the first case, our goal is to disable the subchannel, and nothing > > more will be done with it. > > > > In the second case, we simply want to perform a reset operation; that > > is, get the subchannel into a clean state. The disable/enable is just a > > means to get there. > > > Looking at what the common I/O layer does with the cancel_halt_clear > > operation, it moves the device into a special quiescing state so that > > no new I/O will be started. And I think that's how it is intended to be > > used: If we want to quiesce the subchannel prior to remove/shutdown, > > fencing off any new I/O is what we want. That means nobody will submit > > requests we're not aware of. > > Agreed. Under normal conditions, if someone explicitly requested the > removal of the device we can safely assume that they are no more I/O > request sent from the userspace. Yes, and we would not lose any functionality if we actively fence this. > > > > > That leaves the reset case, in which disabling the subchannel is only a > > means to reset the subchannel to a defined state. We definitely want to > > accept new I/O requests once we're done with the disable/enable dance. > > That leaves the question of what to do with requests that come in while > > resetting: Reject them? Queue them for later? Is disable/enable even > > the best solution here? We basically came up with it because we > > couldn't think of anything else... > > > > So right now the reset callback is invoked through an ioctl. Would a > sane userspace try to issue read/write request before the completion of > reset? I am assuming that if userspace is requesting a device reset, it > will at least wait for the reset to complete? It depends on the semantics of reset, of which I am not completely clear TBH. It's probably ok to fence here as well, maybe with a different error ("try again later" vs. "the device is going away anyway" in the first case). > > > > (And yes, I also find it confusing that the quiesce function not only > > clears out pending I/O, but also disables the subchannel...) > > > >> > >> Suggested-by: Eric Farman <farman@xxxxxxxxxxxxx> > >> Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx> > >> --- > >> drivers/s390/cio/vfio_ccw_drv.c | 28 ++++++++++++---------------- > >> 1 file changed, 12 insertions(+), 16 deletions(-) > >> > >> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c > >> index 5aca475..f87757b 100644 > >> --- a/drivers/s390/cio/vfio_ccw_drv.c > >> +++ b/drivers/s390/cio/vfio_ccw_drv.c > >> @@ -43,26 +43,22 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch) > >> if (ret != -EBUSY) > >> goto out_unlock; > >> > >> + iretry = 255; > >> do { > >> - iretry = 255; > >> > >> - ret = cio_cancel_halt_clear(sch, &iretry); > >> - while (ret == -EBUSY) { > >> - /* > >> - * Flush all I/O and wait for > >> - * cancel/halt/clear completion. > >> - */ > >> - private->completion = &completion; > >> - spin_unlock_irq(sch->lock); > >> + cio_cancel_halt_clear(sch, &iretry); > > > > I think you still want to check the return code here... > > > >> + /* > >> + * Flush all I/O and wait for > >> + * cancel/halt/clear completion. > >> + */ > >> + private->completion = &completion; > >> + spin_unlock_irq(sch->lock); > >> > >> - wait_for_completion_timeout(&completion, 3*HZ); > >> - > >> - private->completion = NULL; > >> - flush_workqueue(vfio_ccw_work_q); > >> - spin_lock_irq(sch->lock); > >> - ret = cio_cancel_halt_clear(sch, &iretry); > >> - }; > >> + wait_for_completion_timeout(&completion, 3*HZ); > > > > ...because you don't want to wait for an interrupt that won't arrive > > (e.g. when xsch was successful, or if the subchannel has become > > non-operational in the meantime.) > > > > Hmm, this makes sense. I think we probably could wrap the waiting within > an if condition, similar to io_subchannel_quiesce()? Probably yes. > > >> > >> + private->completion = NULL; > >> + flush_workqueue(vfio_ccw_work_q); > >> + spin_lock_irq(sch->lock); > >> ret = cio_disable_subchannel(sch); > >> } while (ret == -EBUSY); > >> out_unlock: > > > > Regardless of my comments above, this looks like an improvement over > > what we have now. Have you been able to observe the issue in real life? > > > > > Yes, I have observed this in my testing. I think if you try to remove > the mediated device (echo 1 > /path/to/mdev/remove) while a guest is > running I/O, there is a good chance you will enter in this infinite loop. Good to know, a testcase is always helpful :) > > Thanks for taking a look at the patches. > > Thanks > Farhan >