Re: [RFC v1 2/3] vfio-ccw: Prevent quiesce function going into an infinite loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 04/03/2019 03:58 AM, Cornelia Huck wrote:
On Tue,  2 Apr 2019 12:44:34 -0400
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:

The quiesce function calls cio_cancel_halt_clear() and if we
get an -EBUSY we go into a loop where we:
	- wait for any interrupts
	- flush all I/O in the workqueue
	- retry cio_cancel_halt_clear

During the period where we are waiting for interrupts or
flushing all I/O, the channel subsystem could have completed
a halt/clear action and turned off the corresponding activity
control bits in the subchannel status word. This means the next
time we call cio_cancel_halt_clear(), we will again start by
calling cancel subchannel and so we can be stuck between calling
cancel and halt forever.

Rather than calling cio_cancel_halt_clear() immediately after
waiting, let's try to disable the subchannel. If we succeed in
disabling the subchannel then we know nothing else can happen
with the device.

Hmm... I had to spend some time looking at the code and it seems to be
a bit confused (not your patch, the general logic).

Basically, we call the quiesce function for two reasons:
- The device is shutdown/removed. We don't want to do anything with the
   device afterwards, and especially want nobody to start anything new.
   The subchannel will be disabled, and stay like that.
- The mdev reset callback is invoked. When resetting, we basically
   disable the subchannel (any I/O of course needs to be done prior to
   that), and then enable it again.

In the first case, our goal is to disable the subchannel, and nothing
more will be done with it.

In the second case, we simply want to perform a reset operation; that
is, get the subchannel into a clean state. The disable/enable is just a
means to get there.
 > Looking at what the common I/O layer does with the cancel_halt_clear
operation, it moves the device into a special quiescing state so that
no new I/O will be started. And I think that's how it is intended to be
used: If we want to quiesce the subchannel prior to remove/shutdown,
fencing off any new I/O is what we want. That means nobody will submit
requests we're not aware of.

Agreed. Under normal conditions, if someone explicitly requested the removal of the device we can safely assume that they are no more I/O request sent from the userspace.


That leaves the reset case, in which disabling the subchannel is only a
means to reset the subchannel to a defined state. We definitely want to
accept new I/O requests once we're done with the disable/enable dance.
That leaves the question of what to do with requests that come in while
resetting: Reject them? Queue them for later? Is disable/enable even
the best solution here? We basically came up with it because we
couldn't think of anything else...


So right now the reset callback is invoked through an ioctl. Would a sane userspace try to issue read/write request before the completion of reset? I am assuming that if userspace is requesting a device reset, it will at least wait for the reset to complete?


(And yes, I also find it confusing that the quiesce function not only
clears out pending I/O, but also disables the subchannel...)


Suggested-by: Eric Farman <farman@xxxxxxxxxxxxx>
Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
---
  drivers/s390/cio/vfio_ccw_drv.c | 28 ++++++++++++----------------
  1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 5aca475..f87757b 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -43,26 +43,22 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
  	if (ret != -EBUSY)
  		goto out_unlock;
+ iretry = 255;
  	do {
-		iretry = 255;
- ret = cio_cancel_halt_clear(sch, &iretry);
-		while (ret == -EBUSY) {
-			/*
-			 * Flush all I/O and wait for
-			 * cancel/halt/clear completion.
-			 */
-			private->completion = &completion;
-			spin_unlock_irq(sch->lock);
+		cio_cancel_halt_clear(sch, &iretry);

I think you still want to check the return code here...

+		/*
+		 * Flush all I/O and wait for
+		 * cancel/halt/clear completion.
+		 */
+		private->completion = &completion;
+		spin_unlock_irq(sch->lock);
- wait_for_completion_timeout(&completion, 3*HZ);
-
-			private->completion = NULL;
-			flush_workqueue(vfio_ccw_work_q);
-			spin_lock_irq(sch->lock);
-			ret = cio_cancel_halt_clear(sch, &iretry);
-		};
+		wait_for_completion_timeout(&completion, 3*HZ);

...because you don't want to wait for an interrupt that won't arrive
(e.g. when xsch was successful, or if the subchannel has become
non-operational in the meantime.)


Hmm, this makes sense. I think we probably could wrap the waiting within an if condition, similar to io_subchannel_quiesce()?

+ private->completion = NULL;
+		flush_workqueue(vfio_ccw_work_q);
+		spin_lock_irq(sch->lock);
  		ret = cio_disable_subchannel(sch);
  	} while (ret == -EBUSY);
  out_unlock:

Regardless of my comments above, this looks like an improvement over
what we have now. Have you been able to observe the issue in real life?


Yes, I have observed this in my testing. I think if you try to remove the mediated device (echo 1 > /path/to/mdev/remove) while a guest is running I/O, there is a good chance you will enter in this infinite loop.

Thanks for taking a look at the patches.

Thanks
Farhan




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux