On Thu, Sep 09 2021, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > The subchannel should be left in a quiescent state unless the VFIO device > FD is opened. When the FD is opened bring the chanel to active and allow > the VFIO device to operate. When the device FD is closed then quiesce the > channel. > > To make this work the FSM needs to handle the transitions to/from open and > closed so everything is sequenced. Rename state NOT_OPER to BROKEN and use > it wheneven the driver has malfunctioned. STANDBY becomes CLOSED. The > normal case FSM looks like: > CLOSED -> IDLE -> PROCESS/PENDING* -> IDLE -> CLOSED > > With a possible branch off to BROKEN from any state. Once the device is in > BROKEN it cannot be recovered other than be reloading the driver. Hm, not sure whether it is a good idea to conflate "something went wrong" and "device is not operational". In the latter case, we will eventually get a removal of the css device when the common I/O layer has processed the channel report for the subchannel; while the former case could mean all kind of things, but the subchannel will likely stay around. I think NOT_OPER was always meant to be a transitional state. > > Delete the triply redundant calls to > vfio_ccw_sch_quiesce(). vfio_ccw_mdev_close_device() always leaves the > subchannel quiescent. vfio_ccw_mdev_remove() cannot return until > vfio_ccw_mdev_close_device() completes and vfio_ccw_sch_remove() cannot > return until vfio_ccw_mdev_remove() completes. Have the FSM code take care > of calling cp_free() when appropriate. I remember some serialization issues wrt cp_free() etc. coming up every now and than; that might need extra care (I'm taking a look.) > > Device reset becomes a CLOSE/OPEN sequence which now properly handles the > situation if the device becomes BROKEN. > > Machine shutdown via vfio_ccw_sch_shutdown() now simply tries to close and > leaves the device BROKEN (though arguably the bus should take care to > quiet down the subchannel HW during shutdown, not the drivers) The problem is that there is not really a uniform thing that can be done for shutdown; e.g. we can quiesce and then disable I/O and EADM subchannels, but CHSC subchannels cannot be quiesced.