On Tue, 13 Apr 2021 20:24:10 +0200 Eric Farman <farman@xxxxxxxxxxxxx> wrote: > Today, the stacked call to vfio_ccw_sch_io_todo() does three things: > > 1) Update a solicited IRB with CP information, and release the CP > if the interrupt was the end of a START operation. > 2) Copy the IRB data into the io_region, under the protection of > the io_mutex > 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that > vfio-ccw can accept more work. > > The trouble is that step 3 is (A) invoked for both solicited and > unsolicited interrupts, and (B) sitting after the mutex for step 2. > This second piece becomes a problem if it processes an interrupt > for a CLEAR SUBCHANNEL while another thread initiates a START, > thus allowing the CP and FSM states to get out of sync. That is: > > CPU 1 CPU 2 > fsm_do_clear() > fsm_irq() > fsm_io_request() > fsm_io_helper() > vfio_ccw_sch_io_todo() > fsm_irq() > vfio_ccw_sch_io_todo() > > Let's move the reset of the FSM state to the point where the > channel_program struct is cleaned up, which is only done for > solicited interrupts anyway. > > Signed-off-by: Eric Farman <farman@xxxxxxxxxxxxx> > --- > drivers/s390/cio/vfio_ccw_drv.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c > index 8c625b530035..e51318f23ca8 100644 > --- a/drivers/s390/cio/vfio_ccw_drv.c > +++ b/drivers/s390/cio/vfio_ccw_drv.c > @@ -94,16 +94,15 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) > (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); > if (scsw_is_solicited(&irb->scsw)) { > cp_update_scsw(&private->cp, &irb->scsw); > - if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) > + if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) { > cp_free(&private->cp); > + private->state = VFIO_CCW_STATE_IDLE; > + } > } > mutex_lock(&private->io_mutex); > memcpy(private->io_region->irb_area, irb, sizeof(*irb)); > mutex_unlock(&private->io_mutex); > > - if (private->mdev && is_final) > - private->state = VFIO_CCW_STATE_IDLE; Isn't that re-allowing new I/O requests a bit too early? Maybe remember that we had a final I/O interrupt for an I/O request and only change the state in this case? > - > if (private->io_trigger) > eventfd_signal(private->io_trigger, 1); > }