> From: Zhao, Yan Y <yan.y.zhao@xxxxxxxxx> > Sent: Friday, June 28, 2024 11:19 PM > > In the device cdev path, adjust the handling of the KVM reference count to > only increment with the first vfio_df_open() and decrement after the final > vfio_df_close(). This change addresses a KVM reference leak that occurs > when a device cdev file is opened multiple times and attempts to bind to > iommufd repeatedly. > > Currently, vfio_df_get_kvm_safe() is invoked prior to each vfio_df_open() > in the cdev path during iommufd binding. The corresponding > vfio_device_put_kvm() is executed either when iommufd is unbound or if an > error occurs during the binding process. > > However, issues arise when a device binds to iommufd more than once. The > second vfio_df_open() will fail during iommufd binding, and > vfio_device_put_kvm() will be triggered, setting device->kvm to NULL. > Consequently, when iommufd is unbound from the first successfully bound > device, vfio_device_put_kvm() becomes ineffective, leading to a leak in the > KVM reference count. To be accurate this happens only when two binds are issued via different fds otherwise below check will happen earlier when two binds are in a same fd: /* one device cannot be bound twice */ if (df->access_granted) { ret = -EINVAL; goto out_unlock; } > > Below is the calltrace that will be produced in this scenario when the KVM > module is unloaded afterwards, reporting "BUG kvm_vcpu (Tainted: G S): > Objects remaining in kvm_vcpu on __kmem_cache_shutdown()". > > Call Trace: > <TASK> > dump_stack_lvl+0x80/0xc0 > slab_err+0xb0/0xf0 > ? __kmem_cache_shutdown+0xc1/0x4e0 > ? rcu_is_watching+0x11/0x50 > ? lock_acquired+0x144/0x3c0 > __kmem_cache_shutdown+0x1b7/0x4e0 > kmem_cache_destroy+0xa6/0x260 > kvm_exit+0x80/0xc0 [kvm] > vmx_exit+0xe/0x20 [kvm_intel] > __x64_sys_delete_module+0x143/0x250 > ? ktime_get_coarse_real_ts64+0xd3/0xe0 > ? syscall_trace_enter+0x143/0x210 > do_syscall_64+0x6f/0x140 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") > Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > --- > drivers/vfio/device_cdev.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c > index bb1817bd4ff3..3b85d01d1b27 100644 > --- a/drivers/vfio/device_cdev.c > +++ b/drivers/vfio/device_cdev.c > @@ -65,6 +65,7 @@ long vfio_df_ioctl_bind_iommufd(struct > vfio_device_file *df, > { > struct vfio_device *device = df->device; > struct vfio_device_bind_iommufd bind; > + bool put_kvm = false; > unsigned long minsz; > int ret; > > @@ -101,12 +102,15 @@ long vfio_df_ioctl_bind_iommufd(struct > vfio_device_file *df, > } > > /* > - * Before the device open, get the KVM pointer currently > + * Before the device's first open, get the KVM pointer currently > * associated with the device file (if there is) and obtain > - * a reference. This reference is held until device closed. > + * a reference. This reference is held until device's last closed. > * Save the pointer in the device for use by drivers. > */ > - vfio_df_get_kvm_safe(df); > + if (device->open_count == 0) { > + vfio_df_get_kvm_safe(df); > + put_kvm = true; > + } > > ret = vfio_df_open(df); > if (ret) > @@ -129,7 +133,8 @@ long vfio_df_ioctl_bind_iommufd(struct > vfio_device_file *df, > out_close_device: > vfio_df_close(df); > out_put_kvm: > - vfio_device_put_kvm(device); > + if (put_kvm) > + vfio_device_put_kvm(device); > iommufd_ctx_put(df->iommufd); > df->iommufd = NULL; > out_unlock: > what about extending vfio_df_open() to unify the get/put_kvm() and open_count trick in one place? int vfio_df_open(struct vfio_device_file *df, struct kvm *kvm, spinlock_t *kvm_ref_lock) { struct vfio_device *device = df->device; int ret = 0; lockdep_assert_held(&device->dev_set->lock); if (device->open_count == 0) { spin_lock(kvm_ref_lock); vfio_device_get_kvm_safe(device, kvm); spin_unlock(kvm_ref_lock); } /* * Only the group path allows the device to be opened multiple * times. The device cdev path doesn't have a secure way for it. */ if (device->open_count != 0 && !df->group) return -EINVAL; device->open_count++; if (device->open_count == 1) { ret = vfio_df_device_first_open(df); if (ret) device->open_count--; } if (ret) vfio_device_put_kvm(device); return ret; } void vfio_df_close(struct vfio_device_file *df) { struct vfio_device *device = df->device; lockdep_assert_held(&device->dev_set->lock); vfio_assert_device_open(device); if (device->open_count == 1) { vfio_df_device_last_close(df); vfio_device_put_kvm(device); } device->open_count--; }