Re: [RFC v1 2/2] vfio-ccw: Don't exit early if state of the vfio-ccw subchannel is not idle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 01/22/2019 06:06 AM, Halil Pasic wrote:
On Tue, 22 Jan 2019 11:20:41 +0100
Cornelia Huck <cohuck@xxxxxxxxxx> wrote:

On Mon, 21 Jan 2019 21:10:32 +0100
Halil Pasic <pasic@xxxxxxxxxxxxx> wrote:

On Mon, 21 Jan 2019 17:55:39 +0100
Cornelia Huck <cohuck@xxxxxxxxxx> wrote:

On Mon, 21 Jan 2019 11:14:16 -0500
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
On 01/21/2019 10:52 AM, Cornelia Huck wrote:
On Mon, 21 Jan 2019 10:48:32 -0500
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
On 01/21/2019 10:18 AM, Cornelia Huck wrote:
On Mon, 21 Jan 2019 09:54:09 -0500
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
This check is unecessary as we already have the vfio state machine
to handle I/O requests.

On the other hand, this check returns incorrect information to
userspace if the state of the subchannel is not idle. For example
if the state is busy and new I/O request comes in, this will return
an EACCES, whereas we should return EBUSY.

Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
---
    drivers/s390/cio/vfio_ccw_ops.c | 2 --
    1 file changed, 2 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index f673e10..3fdcc6d 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
    		return -EINVAL;
private = dev_get_drvdata(mdev_parent_dev(mdev));
-	if (private->state != VFIO_CCW_STATE_IDLE)
-		return -EACCES;
region = private->io_region;
    	if (copy_from_user((void *)region + *ppos, buf,
count))

Hm, the patchset for halt/clear handling I recently posted
changes this to a check for NOT_OPER || STANDBY. What do you
think of that option?

I am concerned with the return code that we send userspace. With
the state machines we return an EIO for NOT_OPER or STANDBY, but
we return EACCES in the early check. QEMU on an EACCES returns a
'not_oper' to the guest and for EIO will inject an interrupt.

I actually think that not_oper is the saner option here: the
device is in a state on the vfio side where it is not usable...

Agreed. Then we should the change the return codes with the state
machine to EACCES as well, that could be a follow up patch to your
series.

Yes, let's do that.

As you seem to be able to hit this problem with your workload, maybe
post it as a standalone fix? I can easily rebase my series on top of
that (and I assume that it will need more review than a simple fix.)

I had an issue following the discussion so I chatted up Farhan. Turns
out the bad private->state he sees is VFIO_CCW_STATE_BUSY. That is, I
guess, a should normally not happen. I mean QEMU should not accept a new
SSCH before the START_PENDING for the old one is cleared. And that would
normally happen after the IO corresponding to the old SSCH is done. Well
we do have our wonderful emulated CSCH/SSCH at the moment (which do clear
the SCSW in QEMU but don't bother to terminate IO at the device). And we
may or may not have bugs in all the SCSW handling/updating code as well.

Well, that hopefully becomes better once we have proper halt/clear
passthrough. Any kind of emulation will continue to be problematic in
one way or the other.

The problem with this patch is that we poke private->io_region before
calling the state machine stuff. That could race with the interrupt
handler touching the same region. That could have been OK, if IRB in the
region was read-only. But it is not, and QEMU zeroes it out on each
write AFAICT. Changing the condition to NOT_OPER || STANDBY or
omitting the check (at the moment) does not seem to be what I would call
a fix.

So, should this wait until after we rewrite the state machine and I/O
region concurrency?

I'm not sure what the test setup exercises, but making the return code
(-EACCES vs. -EIO) consistent looks like a win to me.


The test setup basically runs fio (sequential, random) read and write workloads for a long period of time. I run each workload at least for 5 minutes and keep running them in a continuous loop for few hours.

Also I am testing with Hyper PAV, so I run the workloads with different number of HPAVs for a base device.

I agree, we can wait till we have a proper solution for concurrency.



Regarding the test setup Farhan is likely to be able to provide more
info. And my intuition says, yes it should wait. I would also like to
see the error codes, which are:
* obviously a substantial part of the interface, and need to be mapped
   to channel IO conditions, and thus
* beyond usual file IO stuff (i.e. man 3 write won't give you the right
   explanation for the error codes
properly documented. @Farhan: Would you like to tackle this one?


Are you suggesting documenting the mapping of the error codes in the vfio-ccw doc? I could do that, but I fear the doc wouldn't be updated as often.

Given what QEMU does with EIO and EACCESS, I agree, if NOT_OPER ||
STANDBY EACCESS (that is cc 3 not operational) is more appropriate.


Regards,
Halil






[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux