On Tue, 22 Jan 2019 13:46:12 +0100 Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > On Tue, 22 Jan 2019 12:53:22 +0100 > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > On Tue, 22 Jan 2019 12:17:37 +0100 > > Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > > > > > On Tue, 22 Jan 2019 11:29:26 +0100 > > > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > > > > > On Mon, 21 Jan 2019 21:20:18 +0100 > > > > Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > > > > > > > > > On Mon, 21 Jan 2019 12:03:51 +0100 > > > > > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > > > > > > > > > Rework handling of multiple I/O requests to return -EAGAIN if > > > > > > we are already processing an I/O request. Introduce a mutex > > > > > > to disallow concurrent writes to the I/O region. > > > > > > > > > > > > The expectation is that userspace simply retries the operation > > > > > > if it gets -EAGAIN. > > > > > > > > > > > > We currently don't allow multiple ssch requests at the same > > > > > > time, as we don't have support for keeping channel programs > > > > > > around for more than one request. > > > > > > > > > > > > Signed-off-by: Cornelia Huck <cohuck@xxxxxxxxxx> > > > > > > --- > > > > > > > > > > [..] > > > > > > > > > > > static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, > > > > > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev, > > > > > > { > > > > > > struct vfio_ccw_private *private; > > > > > > struct ccw_io_region *region; > > > > > > + int ret; > > > > > > > > > > > > if (*ppos + count > sizeof(*region)) > > > > > > return -EINVAL; > > > > > > > > > > > > private = dev_get_drvdata(mdev_parent_dev(mdev)); > > > > > > - if (private->state != VFIO_CCW_STATE_IDLE) > > > > > > + if (private->state == VFIO_CCW_STATE_NOT_OPER || > > > > > > + private->state == VFIO_CCW_STATE_STANDBY) > > > > > > return -EACCES; > > > > > > + if (!mutex_trylock(&private->io_mutex)) > > > > > > + return -EAGAIN; > > > > > > > > > > > > region = private->io_region; > > > > > > - if (copy_from_user((void *)region + *ppos, buf, count)) > > > > > > - return -EFAULT; > > > > > > + if (copy_from_user((void *)region + *ppos, buf, count)) { > > > > > > > > > > This might race with vfio_ccw_sch_io_todo() on > > > > > private->io_region->irb_area, or? > > > > > > > > Ah yes, this should also take the mutex (should work because we're on a > > > > workqueue). > > > > > > > > > > I'm not sure that will do the trick (assumed I understood the > > > intention correctly). Let's say the things happen in this order: > > > 1) vfio_ccw_sch_io_todo() goes first, I guess updates > > > private->io_region->irb_area and releases the mutex. > > > 2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out, > > > and finally, > > > 3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read(). > > > > > > Or am I misunderstanding something? > > > > You're not, but dealing with that race is outside the scope of this > > patch. If userspace submits a request and then tries to get the old > > data for a prior request, I suggest that userspace needs to fix their > > sequencing. > > > > I tend to disagree, because I think this would be a degradation compared > to what we have right now. > > Let me explain. I guess the current idea is that the private->state != > VFIO_CCW_STATE_IDLE check safeguards against this. Yes we lack proper > synchronization (atomic/interlocked access or locks) that would guarantee > that different thread observe state transitions as required -- no > splint brain. But if state were atomic the scenario I have in mind can > not happen, because we get the solicited interrupt in BUSY state (and > set IDLE in vfio_ccw_sch_io_todo()). This BUSY handling is broken for another case: If the guest requests intermediate interrupts, there may be more than one interrupt by the hardware -- and we still go out of BUSY state. (Freeing the cp is also broken in that case.) However, the Linux dasd driver does not seem to do that. > Unsolicited interrupts are another > piece of cake -- I have no idea how may of those do we get. They at least don't have the "free the cp before we got final state" bug. But I think both are reasons to get away from "use the BUSY state to ensure the right sequence". > And because > of this the broken sequencing in userspace could actually be the kernels > fault. Here, I can't follow you at all :(