On Wed, 2021-04-21 at 12:25 +0200, Cornelia Huck wrote: > On Tue, 13 Apr 2021 20:24:10 +0200 > Eric Farman <farman@xxxxxxxxxxxxx> wrote: > > > Today, the stacked call to vfio_ccw_sch_io_todo() does three > > things: > > > > 1) Update a solicited IRB with CP information, and release the CP > > if the interrupt was the end of a START operation. > > 2) Copy the IRB data into the io_region, under the protection of > > the io_mutex > > 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that > > vfio-ccw can accept more work. > > > > The trouble is that step 3 is (A) invoked for both solicited and > > unsolicited interrupts, and (B) sitting after the mutex for step 2. > > This second piece becomes a problem if it processes an interrupt > > for a CLEAR SUBCHANNEL while another thread initiates a START, > > thus allowing the CP and FSM states to get out of sync. That is: > > > > CPU 1 CPU 2 > > fsm_do_clear() > > fsm_irq() > > fsm_io_request() > > fsm_io_helper() > > vfio_ccw_sch_io_todo() > > fsm_irq() > > vfio_ccw_sch_io_todo() > > > > Let's move the reset of the FSM state to the point where the > > channel_program struct is cleaned up, which is only done for > > solicited interrupts anyway. > > > > Signed-off-by: Eric Farman <farman@xxxxxxxxxxxxx> > > --- > > drivers/s390/cio/vfio_ccw_drv.c | 7 +++---- > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c > > b/drivers/s390/cio/vfio_ccw_drv.c > > index 8c625b530035..e51318f23ca8 100644 > > --- a/drivers/s390/cio/vfio_ccw_drv.c > > +++ b/drivers/s390/cio/vfio_ccw_drv.c > > @@ -94,16 +94,15 @@ static void vfio_ccw_sch_io_todo(struct > > work_struct *work) > > (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); > > if (scsw_is_solicited(&irb->scsw)) { > > cp_update_scsw(&private->cp, &irb->scsw); > > - if (is_final && private->state == > > VFIO_CCW_STATE_CP_PENDING) > > + if (is_final && private->state == > > VFIO_CCW_STATE_CP_PENDING) { > > cp_free(&private->cp); > > + private->state = VFIO_CCW_STATE_IDLE; > > + } > > } > > mutex_lock(&private->io_mutex); > > memcpy(private->io_region->irb_area, irb, sizeof(*irb)); > > mutex_unlock(&private->io_mutex); > > > > - if (private->mdev && is_final) > > - private->state = VFIO_CCW_STATE_IDLE; > > Isn't that re-allowing new I/O requests a bit too early? Hrm... I guess I don't see what work vfio-ccw has left to do that is presenting it from carrying on. The copying of the IRB data back into the io_region seems like a flimsy gate to me. But... It seems you're (rightly) concerned with userspace doing SSCH + SSCH, whereas I'v been focused on the CSCH + SSCH sequence. So with this change, we're inviting the possibility of a second SSCH being able to be submitted/started before the IRB data for the first SSCH is copied (and presumably before userspace is tapped to read that data back). Sigh... I guess that's not the greatest behavior either. Gotta ruminate on this. > Maybe remember > that we had a final I/O interrupt for an I/O request and only change > the state in this case? As a local flag within this routine? Hrm... I have entirely too many "Let's try this" branches that didn't work, but I don't see that one jumping out at me. Will give it a try. > > > > - > > if (private->io_trigger) > > eventfd_signal(private->io_trigger, 1); > > }