On Mon, 21 Jan 2019 21:10:32 +0100 Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > On Mon, 21 Jan 2019 17:55:39 +0100 > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > On Mon, 21 Jan 2019 11:14:16 -0500 > > Farhan Ali <alifm@xxxxxxxxxxxxx> wrote: > > > > > On 01/21/2019 10:52 AM, Cornelia Huck wrote: > > > > On Mon, 21 Jan 2019 10:48:32 -0500 > > > > Farhan Ali <alifm@xxxxxxxxxxxxx> wrote: > > > > > > > >> On 01/21/2019 10:18 AM, Cornelia Huck wrote: > > > >>> On Mon, 21 Jan 2019 09:54:09 -0500 > > > >>> Farhan Ali <alifm@xxxxxxxxxxxxx> wrote: > > > >>> > > > >>>> This check is unecessary as we already have the vfio state machine > > > >>>> to handle I/O requests. > > > >>>> > > > >>>> On the other hand, this check returns incorrect information to > > > >>>> userspace if the state of the subchannel is not idle. For example > > > >>>> if the state is busy and new I/O request comes in, this will return > > > >>>> an EACCES, whereas we should return EBUSY. > > > >>>> > > > >>>> Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx> > > > >>>> --- > > > >>>> drivers/s390/cio/vfio_ccw_ops.c | 2 -- > > > >>>> 1 file changed, 2 deletions(-) > > > >>>> > > > >>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c > > > >>>> index f673e10..3fdcc6d 100644 > > > >>>> --- a/drivers/s390/cio/vfio_ccw_ops.c > > > >>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c > > > >>>> @@ -193,8 +193,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, > > > >>>> return -EINVAL; > > > >>>> > > > >>>> private = dev_get_drvdata(mdev_parent_dev(mdev)); > > > >>>> - if (private->state != VFIO_CCW_STATE_IDLE) > > > >>>> - return -EACCES; > > > >>>> > > > >>>> region = private->io_region; > > > >>>> if (copy_from_user((void *)region + *ppos, buf, > > > >>>> count)) > > > >>> > > > >>> Hm, the patchset for halt/clear handling I recently posted > > > >>> changes this to a check for NOT_OPER || STANDBY. What do you > > > >>> think of that option? > > > >>> > > > >>> > > > >> I am concerned with the return code that we send userspace. With > > > >> the state machines we return an EIO for NOT_OPER or STANDBY, but > > > >> we return EACCES in the early check. QEMU on an EACCES returns a > > > >> 'not_oper' to the guest and for EIO will inject an interrupt. > > > > > > > > I actually think that not_oper is the saner option here: the > > > > device is in a state on the vfio side where it is not usable... > > > > > > Agreed. Then we should the change the return codes with the state > > > machine to EACCES as well, that could be a follow up patch to your > > > series. > > > > Yes, let's do that. > > > > As you seem to be able to hit this problem with your workload, maybe > > post it as a standalone fix? I can easily rebase my series on top of > > that (and I assume that it will need more review than a simple fix.) > > > > I had an issue following the discussion so I chatted up Farhan. Turns > out the bad private->state he sees is VFIO_CCW_STATE_BUSY. That is, I > guess, a should normally not happen. I mean QEMU should not accept a new > SSCH before the START_PENDING for the old one is cleared. And that would > normally happen after the IO corresponding to the old SSCH is done. Well > we do have our wonderful emulated CSCH/SSCH at the moment (which do clear > the SCSW in QEMU but don't bother to terminate IO at the device). And we > may or may not have bugs in all the SCSW handling/updating code as well. Well, that hopefully becomes better once we have proper halt/clear passthrough. Any kind of emulation will continue to be problematic in one way or the other. > The problem with this patch is that we poke private->io_region before > calling the state machine stuff. That could race with the interrupt > handler touching the same region. That could have been OK, if IRB in the > region was read-only. But it is not, and QEMU zeroes it out on each > write AFAICT. Changing the condition to NOT_OPER || STANDBY or > omitting the check (at the moment) does not seem to be what I would call > a fix. So, should this wait until after we rewrite the state machine and I/O region concurrency? I'm not sure what the test setup exercises, but making the return code (-EACCES vs. -EIO) consistent looks like a win to me.