On Wed, 2021-04-21 at 08:58 -0400, Eric Farman wrote: > On Wed, 2021-04-21 at 12:25 +0200, Cornelia Huck wrote: > > On Tue, 13 Apr 2021 20:24:10 +0200 > > Eric Farman <farman@xxxxxxxxxxxxx> wrote: > > > > > Today, the stacked call to vfio_ccw_sch_io_todo() does three > > > things: > > > > > > 1) Update a solicited IRB with CP information, and release the CP > > > if the interrupt was the end of a START operation. > > > 2) Copy the IRB data into the io_region, under the protection of > > > the io_mutex > > > 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that > > > vfio-ccw can accept more work. > > > > > > The trouble is that step 3 is (A) invoked for both solicited and > > > unsolicited interrupts, and (B) sitting after the mutex for step > > > 2. > > > This second piece becomes a problem if it processes an interrupt > > > for a CLEAR SUBCHANNEL while another thread initiates a START, > > > thus allowing the CP and FSM states to get out of sync. That is: > > > > > > CPU 1 CPU 2 > > > fsm_do_clear() > > > fsm_irq() > > > fsm_io_request() > > > fsm_io_helper() > > > vfio_ccw_sch_io_todo() > > > fsm_irq() > > > vfio_ccw_sch_io_todo() > > > > > > Let's move the reset of the FSM state to the point where the > > > channel_program struct is cleaned up, which is only done for > > > solicited interrupts anyway. > > > > > > Signed-off-by: Eric Farman <farman@xxxxxxxxxxxxx> > > > --- > > > drivers/s390/cio/vfio_ccw_drv.c | 7 +++---- > > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c > > > b/drivers/s390/cio/vfio_ccw_drv.c > > > index 8c625b530035..e51318f23ca8 100644 > > > --- a/drivers/s390/cio/vfio_ccw_drv.c > > > +++ b/drivers/s390/cio/vfio_ccw_drv.c > > > @@ -94,16 +94,15 @@ static void vfio_ccw_sch_io_todo(struct > > > work_struct *work) > > > (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); > > > if (scsw_is_solicited(&irb->scsw)) { > > > cp_update_scsw(&private->cp, &irb->scsw); > > > - if (is_final && private->state == > > > VFIO_CCW_STATE_CP_PENDING) > > > + if (is_final && private->state == > > > VFIO_CCW_STATE_CP_PENDING) { > > > cp_free(&private->cp); > > > + private->state = VFIO_CCW_STATE_IDLE; > > > + } > > > } > > > mutex_lock(&private->io_mutex); > > > memcpy(private->io_region->irb_area, irb, sizeof(*irb)); > > > mutex_unlock(&private->io_mutex); > > > > > > - if (private->mdev && is_final) > > > - private->state = VFIO_CCW_STATE_IDLE; > > > > Isn't that re-allowing new I/O requests a bit too early? > > Hrm... I guess I don't see what work vfio-ccw has left to do that is > presenting it from carrying on. The copying of the IRB data back into > the io_region seems like a flimsy gate to me. But... > > It seems you're (rightly) concerned with userspace doing SSCH + SSCH, > whereas I'v been focused on the CSCH + SSCH sequence. So with this > change, we're inviting the possibility of a second SSCH being able to > be submitted/started before the IRB data for the first SSCH is copied > (and presumably before userspace is tapped to read that data back). > > Sigh... I guess that's not the greatest behavior either. Gotta > ruminate > on this. > > > Maybe remember > > that we had a final I/O interrupt for an I/O request and only > > change > > the state in this case? > > As a local flag within this routine? Hrm... I have entirely too many > "Let's try this" branches that didn't work, but I don't see that one > jumping out at me. Will give it a try. Still going strong, so that bodes really well (knock wood). I need to spend a little time with patch 2 before I send the next version, but that shouldn't be too long. Eric > > > > > > - > > > if (private->io_trigger) > > > eventfd_signal(private->io_trigger, 1); > > > }