On Mon, Jul 01, 2024 at 06:30:05PM +0800, Yi Liu wrote: > On 2024/7/1 16:43, Tian, Kevin wrote: > > > From: Zhao, Yan Y <yan.y.zhao@xxxxxxxxx> > > > Sent: Friday, June 28, 2024 11:19 PM > > > > > > In the device cdev path, adjust the handling of the KVM reference count to > > > only increment with the first vfio_df_open() and decrement after the final > > > vfio_df_close(). This change addresses a KVM reference leak that occurs > > > when a device cdev file is opened multiple times and attempts to bind to > > > iommufd repeatedly. > > > > > > Currently, vfio_df_get_kvm_safe() is invoked prior to each vfio_df_open() > > > in the cdev path during iommufd binding. The corresponding > > > vfio_device_put_kvm() is executed either when iommufd is unbound or if an > > > error occurs during the binding process. > > > > > > However, issues arise when a device binds to iommufd more than once. The > > > second vfio_df_open() will fail during iommufd binding, and > > > vfio_device_put_kvm() will be triggered, setting device->kvm to NULL. > > > Consequently, when iommufd is unbound from the first successfully bound > > > device, vfio_device_put_kvm() becomes ineffective, leading to a leak in the > > > KVM reference count. > > > > To be accurate this happens only when two binds are issued via different > > fds otherwise below check will happen earlier when two binds are in a > > same fd: > > > > /* one device cannot be bound twice */ > > if (df->access_granted) { > > ret = -EINVAL; > > goto out_unlock; > > } > > yes > > > > > > > Below is the calltrace that will be produced in this scenario when the KVM > > > module is unloaded afterwards, reporting "BUG kvm_vcpu (Tainted: G S): > > > Objects remaining in kvm_vcpu on __kmem_cache_shutdown()". > > > > > > Call Trace: > > > <TASK> > > > dump_stack_lvl+0x80/0xc0 > > > slab_err+0xb0/0xf0 > > > ? __kmem_cache_shutdown+0xc1/0x4e0 > > > ? rcu_is_watching+0x11/0x50 > > > ? lock_acquired+0x144/0x3c0 > > > __kmem_cache_shutdown+0x1b7/0x4e0 > > > kmem_cache_destroy+0xa6/0x260 > > > kvm_exit+0x80/0xc0 [kvm] > > > vmx_exit+0xe/0x20 [kvm_intel] > > > __x64_sys_delete_module+0x143/0x250 > > > ? ktime_get_coarse_real_ts64+0xd3/0xe0 > > > ? syscall_trace_enter+0x143/0x210 > > > do_syscall_64+0x6f/0x140 > > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > > > Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") > > > Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > > > --- > > > drivers/vfio/device_cdev.c | 13 +++++++++---- > > > 1 file changed, 9 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c > > > index bb1817bd4ff3..3b85d01d1b27 100644 > > > --- a/drivers/vfio/device_cdev.c > > > +++ b/drivers/vfio/device_cdev.c > > > @@ -65,6 +65,7 @@ long vfio_df_ioctl_bind_iommufd(struct > > > vfio_device_file *df, > > > { > > > struct vfio_device *device = df->device; > > > struct vfio_device_bind_iommufd bind; > > > + bool put_kvm = false; > > > unsigned long minsz; > > > int ret; > > > > > > @@ -101,12 +102,15 @@ long vfio_df_ioctl_bind_iommufd(struct > > > vfio_device_file *df, > > > } > > > > > > /* > > > - * Before the device open, get the KVM pointer currently > > > + * Before the device's first open, get the KVM pointer currently > > > * associated with the device file (if there is) and obtain > > > - * a reference. This reference is held until device closed. > > > + * a reference. This reference is held until device's last closed. > > > * Save the pointer in the device for use by drivers. > > > */ > > > - vfio_df_get_kvm_safe(df); > > > + if (device->open_count == 0) { > > > + vfio_df_get_kvm_safe(df); > > > + put_kvm = true; > > > + } > > > > > > ret = vfio_df_open(df); > > > if (ret) > > > @@ -129,7 +133,8 @@ long vfio_df_ioctl_bind_iommufd(struct > > > vfio_device_file *df, > > > out_close_device: > > > vfio_df_close(df); > > > out_put_kvm: > > > - vfio_device_put_kvm(device); > > > + if (put_kvm) > > > + vfio_device_put_kvm(device); > > > iommufd_ctx_put(df->iommufd); > > > df->iommufd = NULL; > > > out_unlock: > > > > > > > what about extending vfio_df_open() to unify the get/put_kvm() > > and open_count trick in one place? > > > > int vfio_df_open(struct vfio_device_file *df, struct kvm *kvm, > > spinlock_t *kvm_ref_lock) > > { > > this should work. But need a comment to note why need pass in both kvm > and kvm_ref_lock given df has both of them. :) So why to pass them? What about making vfio_device_group_get_kvm_safe() or vfio_df_get_kvm_safe() not static and calling one of them in vfio_df_open() according to the df->group ? > > > struct vfio_device *device = df->device; > > int ret = 0; > > > > lockdep_assert_held(&device->dev_set->lock); > > > > if (device->open_count == 0) { > > spin_lock(kvm_ref_lock); > > vfio_device_get_kvm_safe(device, kvm); > > spin_unlock(kvm_ref_lock); > > } > > perhaps it can be put in the "if (device->open_count == 1)" branch, just > before invoking vfio_df_device_first_open(). What about just moving the get/put_kvm into vfio_df_device_first_open()? > > > > /* > > * Only the group path allows the device to be opened multiple > > * times. The device cdev path doesn't have a secure way for it. > > */ > > if (device->open_count != 0 && !df->group) > > return -EINVAL; > > > > device->open_count++; > > if (device->open_count == 1) { > > ret = vfio_df_device_first_open(df); > > if (ret) > > device->open_count--; > > } > > > > if (ret) > > vfio_device_put_kvm(device); > > return ret; > > } > > > > void vfio_df_close(struct vfio_device_file *df) > > { > > struct vfio_device *device = df->device; > > > > lockdep_assert_held(&device->dev_set->lock); > > > > vfio_assert_device_open(device); > > if (device->open_count == 1) { > > vfio_df_device_last_close(df); > > vfio_device_put_kvm(device); > > } > > device->open_count--; > > } > > -- > Regards, > Yi Liu