On Mon, May 16, 2022 at 11:21:19PM -0700, Christoph Hellwig wrote: > On Mon, May 16, 2022 at 02:27:34PM -0300, Jason Gunthorpe wrote: > > Normally you'd want to do what is kvm_s390_pci_register_kvm() here, > > where a failure can be propogated but then you have a race condition > > with the kvm. > > > > Blech, maybe it is time to just fix this race condition permanently, > > what do you think? (I didn't even compile it) > > This is roughly were I was planning to get to, with one difference: > I don't think we need or even want the VFIO_DEVICE_NEEDS_KVM flag. > Instead just propagation ->kvm to the device whenever it is set and > let drivers that have a hard requirements on it like gvt fail if it > isn't there. I did it so we didn't uselessly hold a ref on the kvm object, but maybe that is not relevant. > The other question is if we even need an extra reference per device, > can't we hold the group reference until all devices are gone > anyway? That would remove the need to include kvm_host.h in the > vfio code. The device does now hold a reference on the group fd after this patch series: https://lore.kernel.org/r/0-v2-d035a1842d81+1bf-vfio_group_locking_jgg@xxxxxxxxxx However the group does not hold a reference on the KVM, it has a set/remove interface toward KVM and can have its group->kvm pointer NULL'd via an ioctl at any time. So, the semantic here is that the KVM is captured when the device FD opens and then is immutable for the lifetime of that device FD even if the group FD's KVM is reassigned or removed. And I realize that it is all botched, this needs to check and respect the open_count which requires nesting the locks.. Jason