On Fri, Sep 17 2021, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Fri, Sep 17, 2021 at 01:59:16PM +0200, Cornelia Huck wrote: >> > ret = cio_cancel_halt_clear(sch, &iretry); >> > - >> > if (ret == -EIO) { >> > pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n", >> > sch->schid.ssid, sch->schid.sch_no); >> > - break; >> > + return ret; >> >> Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV >> we should be done as well, as then the device is dead and we do not need >> to disable it. > > cio_cancel_halt_clear() should probably succeed in that case. It will actually give us -ENODEV, as the very first call in that function will already fail. > >> > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private, >> > spin_unlock_irq(sch->lock); >> > >> > if (ret == -EBUSY) >> > - wait_for_completion_timeout(&completion, 3*HZ); >> > + wait_for_completion_timeout(&completion, 3 * HZ); >> > >> > private->completion = NULL; >> > flush_workqueue(vfio_ccw_work_q); >> > spin_lock_irq(sch->lock); >> > ret = cio_disable_subchannel(sch); >> > } while (ret == -EBUSY); >> > + return ret; >> > +} >> > + >> > +static void fsm_close(struct vfio_ccw_private *private, >> > + enum vfio_ccw_event event) >> > +{ >> > + struct subchannel *sch = private->sch; >> > + int ret; >> > + >> > + spin_lock_irq(sch->lock); >> > + if (!sch->schib.pmcw.ena) >> > + goto err_unlock; >> > + ret = cio_disable_subchannel(sch); >> >> cio_disable_subchannel() should be happy to disable an already disabled >> subchannel, so I guess we can just walk through this and end up in >> CLOSED state... unless entering with !ena actually indicates that we >> messed up somewhere else in the state machine. I still need to find time >> to read the patches. > > I don't know, I looked at that ena stuff for a bit and couldn't guess > what it is trying to do. It is one of the bits in the pmcw control block that can be modified; if it is 1, the subchannel is enabled and can be used for I/O, if it is 0, the subchannel is disabled and all instructions that initiate or stop I/O will fail. Basically, you enable the subchannel if you actually want to access the device associated with it. Online/offline for (normal usage) ccw devices maps (among other things) to associated subchannel enabled/disabled; for a subchannel that is supposed to be passed via vfio-ccw, we want to have it enabled so that it is actually usable. I think the ena checking had been inspired from what the ccw bus does. We could probably just forge ahead in any case and the called functions in the css bus would be able to handle it just fine, but I have not double checked. > Arguably the channel should not be ripped away from vfio while the FSM > is in the open states, so I'm not sure what a lot of this is for. We could have surprise removal (i.e. a subchannel in active use being ripped out), as that's what happens on real hardware as well. E.g. doing a device_del in QEMU.