On Wed, Jan 11, 2023 at 08:53:34PM +0000, Sean Christopherson wrote: > On Wed, Jan 11, 2023, Jason Gunthorpe wrote: > > On Wed, Jan 11, 2023 at 07:54:51PM +0000, Sean Christopherson wrote: > > > > > Something feels off. If KVM's refcount is 0, then accessing device->group->kvm > > > in vfio_device_open() can't happen unless there's a refcounting bug somewhere. > > > > The problem is in close, not open. > > The deadlock problem is, yes. My point is that if group_lock needs to be taken > when nullifying group->kvm during kvm_vfio_destroy(), then there is also a refcounting > prolem with respect to open(). If there is no refcounting problem, then nullifying > group->kvm during kvm_vfio_destroy() is unnecessary (but again, I doubt this is > the case). IIRC the drivers are supposed to use one of the refcount not zero incrs to counteract this, but I never checked they do.. Yi is working on a patch to change things so vfio drops the kvm pointer when the kvm file closes, not when the reference goes to 0 to avoid a refcount cycle problem which should also solve that. > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index 6e8804fe0095..b3a84d65baa6 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -772,7 +772,12 @@ static struct file *vfio_device_open(struct vfio_device *device) > * reference and release it during close_device. > */ > mutex_lock(&device->group->group_lock); > - device->kvm = device->group->kvm; > + > + if (device->kvm_ops && device->group->kvm) { > + ret = device->kvm_ops->get_kvm(device->group->kvm); At this point I'd rather just use the symbol get stuff like kvm does and call the proper functions. Jason