On Mon, Sep 05, 2022 at 10:46:44AM +0100, Robin Murphy wrote: > > I've been trying to understand Robin's latest remarks because maybe I > > don't really understand your situation right. > > That was really just me thinking out loud to guess at how it must be > happening - I wasn't sure whether VFIO is actually intended to allow that or > not, so if not then by all means let's look at fixing that, but as I say I > think we're only seeing it provoke a problem at the driver level because of > 9ac8545199a1, and fixing VFIO doesn't fix that in general. And conversely if > we *can* fix that properly at the IOMMU API level then the current VFIO > behaviour should become benign again anyway. Okay, so there are probably other problems here that highlighted this.. > > IMHO this is definately a VFIO bug, because in a single-device group > > we must not allow the domain to remain attached past remove(). Or more > > broadly we shouldn't be holding ownership of a group without also > > having a driver attached. > > FWIW I was assuming it might be fine for a VFIO user to hold the group open > if they expect the device to come back again and re-bind (for example, > perhaps over some reconfiguration that requires turning SR-IOV off and on > again?) Once all the devices in the group are removed then something like pci_device_group() will have no way to discover the group again. eg in the SRIOV case it will just fall right down to iommu_group_alloc(), and that gives a new struct iommu_group and new IDR allocation. So in the general case this doesn't happen, I don't think any VFIO userspace should attempt to rely on it. >From an API perspective is a much saner API toward iommu using drivers like VFIO if those drivers only use the iommu api while they have a device driver attached. Regards, Jason