Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling

Eric Farman <farman@xxxxxxxxxxxxx> · Fri, 25 Jan 2019 15:22:56 -0500

On 01/25/2019 05:24 AM, Cornelia Huck wrote:
On Thu, 24 Jan 2019 21:37:44 -0500
Eric Farman <farman@xxxxxxxxxxxxx> wrote:

On 01/24/2019 09:25 PM, Eric Farman wrote:


On 01/21/2019 06:03 AM, Cornelia Huck wrote:

[1] I think these changes are cool.  We end up going into (and staying
in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we
bumble along.

But why can't these be separated out from this patch?  It does change
the behavior of the state machine, and seem distinct from the addition
of the mutex you otherwise add here?  At the very least, this behavior
change should be documented in the commit since it's otherwise lost in
the mutex/EAGAIN stuff.

That's a very good idea. I'll factor them out into a separate patch.

   
       trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
                      io_region->ret_code, errstr);
   }

diff --git a/drivers/s390/cio/vfio_ccw_ops.c
b/drivers/s390/cio/vfio_ccw_ops.c
index f673e106c041..3fa9fc570400 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct
mdev_device *mdev,
   {
       struct vfio_ccw_private *private;
       struct ccw_io_region *region;
+    int ret;
       if (*ppos + count > sizeof(*region))
           return -EINVAL;
       private = dev_get_drvdata(mdev_parent_dev(mdev));
+    mutex_lock(&private->io_mutex);
       region = private->io_region;
       if (copy_to_user(buf, (void *)region + *ppos, count))
-        return -EFAULT;
-
-    return count;
+        ret = -EFAULT;
+    else
+        ret = count;
+    mutex_unlock(&private->io_mutex);
+    return ret;
   }
   static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
@@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct
mdev_device *mdev,
   {
       struct vfio_ccw_private *private;
       struct ccw_io_region *region;
+    int ret;
       if (*ppos + count > sizeof(*region))
           return -EINVAL;
       private = dev_get_drvdata(mdev_parent_dev(mdev));
-    if (private->state != VFIO_CCW_STATE_IDLE)
+    if (private->state == VFIO_CCW_STATE_NOT_OPER ||
+        private->state == VFIO_CCW_STATE_STANDBY)
           return -EACCES;
+    if (!mutex_trylock(&private->io_mutex))
+        return -EAGAIN;

Ah, I see Halil's difficulty here.

It is true there is a race condition today, and that this doesn't
address it.  That's fine, add it to the todo list.  But even with that,
I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be
serialized (one will get kicked out with a failed trylock() call), while
still leaving the window open between cc=0 on the SSCH and the
subsequent interrupt.  In the latter case, a second SSCH will come
through here, do the copy_from_user below, and then jump to fsm_io_busy
to return EAGAIN.  Do we really want to stomp on io_region in that case?
   Why can't we simply return EAGAIN if state==BUSY?

(Answering my own questions as I skim patch 5...)

Because of course this series is for async handling, while I was looking
specifically at the synchronous code that exists today.  I guess then my
question just remains on how the mutex is adding protection in the sync
case, because that's still not apparent to me.  (Perhaps I missed it in
a reply to Halil; if so I apologize, there were a lot when I returned.)

My idea behind the mutex was to make sure that we get consistent data
when reading/writing (e.g. if one user space thread is reading the I/O
region while another is writing to it).


And from that angle, this accomplishes that.  It just wasn't apparent to 
me at first.

I'm still not certain of how we handle mdev_write when state=BUSY, so 
let me ask my question a different way...

If we come into mdev_write with state=BUSY and we get the lock, 
copy_from_user, and do our jump table we go to fsm_io_busy to set 
ret_code and return -EAGAIN.  Why then don't we set the jump table for 
state=NOT_OPER||STANDBY to do something that will return -EACCES instead 
of how we currently do a direct return of -EACCES before all the 
lock/copy stuff (and the jump table that would take us to fsm_io_error 
and an error message before returning -EIO)?