On Wed, 28 Apr 2021 14:20:08 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Wed, Apr 28, 2021 at 07:09:49PM +0200, Cornelia Huck wrote: > > On Mon, 26 Apr 2021 17:00:09 -0300 > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > This is more complicated because vfio_ccw is sharing the vfio_device > > > between both the mdev_device and its vfio_device and the css_driver. > > > > > > The mdev is a singleton, and the reason for this sharing appears to be to > > > allow the extra css_driver function callbacks to be delivered to the > > > vfio_device. > > > > > > This keeps things as they were, with the css_driver allocating the > > > singleton, not the mdev_driver, this is pretty confusing. I'm also > > > uncertain how the lifetime model for the mdev works in the css_driver > > > callbacks. > > > > > > At this point embed the vfio_device in the vfio_ccw_private and > > > instantiate it as a vfio_device when the mdev probes. The drvdata of both > > > the css_device and the mdev_device point at the private, and container_of > > > is used to get it back from the vfio_device. > > > > I've been staring at this for some time, and I'm not sure whether this > > is a good approach. > > > > We allow at most one mdev per subchannel (slicing it up does not make > > sense), so we can be sure that there's a 1:1 relationship between mdev > > and parent device, and we can track it via a single pointer. > > This seems like one of these cases where using the mdev GUID API was not a > great fit. The ccs_driver should have just directly created a > vfio_device and not gone into the mdev guid lifecycle world. I don't remember much of the discussion back then, but I don't think the explicit generation of devices was the part we needed, but rather some other kind of mediation -- probably iommu related, as subchannels don't have that concept on their own. Anyway, too late to change now. > > > The vfio_ccw_private driver data is allocated during probe (same as for > > other css_drivers.) Embedding a vfio_device here means that we have a > > structure tied into it that is operating with different lifetime rules. > > > > What about creating a second structure instead that can embed the > > vfio_device, is allocated during mdev probing, and is linked up with > > the vfio_ccw_private structure? That would follow the pattern of other > > drivers more closely. > > IIRC we still end up with pointers crossing between the two > structs. If you can't convince yourself that is correct (and I could > not) then it is already buggy today. > > It is as I said to Eric, either there is no concurrency when there is > no mdev and everything is correct today, or there is concurrency and > it seems buggy today too. > > The right answer it to move the allocations out of the css_driver > probe and put them only in the mdev driver probe because they can only > make sense when the mdev driver is instantiated. Then everything is > clear and very understandable how it should work. > > I almost did this, but couldn't figure out how the lifetime of the > ccs_driver callbacks are working relative to the lifetime of the mdev > device since they also reach into these structs. Maybe they can't be > called for some css related reason? Moving allocations to the mdev driver probe makes sense, I guess. We should also move enabling the subchannel to that point in time (I don't remember why we enable it in the css probe function, and can't think of a good reason for that; obviously needs to be paired with quiescing and disabling the subchannel in the mdev driver remove function); that leaves the uevent dance (which can hopefully also be removed, if some discussed changes are implemented in the common I/O layer) and fencing QDIO. Regarding the other callbacks, - vfio_ccw_sch_irq should not be invoked if the subchannel is not enabled; maybe log a message before returning for !private. - vfio_ccw_sch_remove should be able to return 0 for !private (nothing to quiesce, if the subchannel is not enabled). - vfio_ccw_sch_shutdown has nothing to do for !private (same reason.) - In vfio_ccw_sch_event, we should either skip the fsm_event and the state change for !private, or return 0 in that case. - vfio_ccw_chp_event already checks for !private. Not sure whether we should try to update some control blocks and return -ENODEV if the subchannel is not operational, but it's probably not needed. Eric, what do you think?