On Tue, Sep 14 2021, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Mon, Sep 13, 2021 at 04:31:54PM -0400, Eric Farman wrote: >> > I rebased it and fixed it up here: >> > >> > https://github.com/jgunthorpe/linux/tree/vfio_ccw >> > >> > Can you try again? >> >> That does address the crash, but then why is it processing a BROKEN >> event? Seems problematic. > > The stuff related to the NOT_OPER looked really wonky to me. I'm > guessing this is the issue - not sure about the pmcw.ena either.. [I have still not been able to digest the whole series, sorry.] > > diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c > index 5ea392959c0711..0d4d4f425befac 100644 > --- a/drivers/s390/cio/vfio_ccw_fsm.c > +++ b/drivers/s390/cio/vfio_ccw_fsm.c > @@ -380,29 +380,19 @@ static void fsm_open(struct vfio_ccw_private *private, > spin_unlock_irq(sch->lock); > } > > -static void fsm_close(struct vfio_ccw_private *private, > - enum vfio_ccw_event event) > +static int flush_sch(struct vfio_ccw_private *private) > { > struct subchannel *sch = private->sch; > DECLARE_COMPLETION_ONSTACK(completion); > int iretry, ret = 0; > > - spin_lock_irq(sch->lock); > - if (!sch->schib.pmcw.ena) > - goto err_unlock; > - ret = cio_disable_subchannel(sch); > - if (ret != -EBUSY) > - goto err_unlock; > - > iretry = 255; > do { > - > ret = cio_cancel_halt_clear(sch, &iretry); > - > if (ret == -EIO) { > pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n", > sch->schid.ssid, sch->schid.sch_no); > - break; > + return ret; Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV we should be done as well, as then the device is dead and we do not need to disable it. > } > > /* > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private, > spin_unlock_irq(sch->lock); > > if (ret == -EBUSY) > - wait_for_completion_timeout(&completion, 3*HZ); > + wait_for_completion_timeout(&completion, 3 * HZ); > > private->completion = NULL; > flush_workqueue(vfio_ccw_work_q); > spin_lock_irq(sch->lock); > ret = cio_disable_subchannel(sch); > } while (ret == -EBUSY); > + return ret; > +} > + > +static void fsm_close(struct vfio_ccw_private *private, > + enum vfio_ccw_event event) > +{ > + struct subchannel *sch = private->sch; > + int ret; > + > + spin_lock_irq(sch->lock); > + if (!sch->schib.pmcw.ena) > + goto err_unlock; > + ret = cio_disable_subchannel(sch); cio_disable_subchannel() should be happy to disable an already disabled subchannel, so I guess we can just walk through this and end up in CLOSED state... unless entering with !ena actually indicates that we messed up somewhere else in the state machine. I still need to find time to read the patches. > + if (ret == -EBUSY) > + ret = flush_sch(private); > if (ret) > goto err_unlock; > private->state = VFIO_CCW_STATE_CLOSED;