Re: [PATCH 1/2] KVM: async kvm_destroy_vm for vfio devices

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 11 Jan 2023 20:53:34 +0000

On Wed, Jan 11, 2023, Jason Gunthorpe wrote:
> On Wed, Jan 11, 2023 at 07:54:51PM +0000, Sean Christopherson wrote:
> 
> > Something feels off.  If KVM's refcount is 0, then accessing device->group->kvm
> > in vfio_device_open() can't happen unless there's a refcounting bug somewhere.
> 
> The problem is in close, not open.

The deadlock problem is, yes.  My point is that if group_lock needs to be taken
when nullifying group->kvm during kvm_vfio_destroy(), then there is also a refcounting
prolem with respect to open().  If there is no refcounting problem, then nullifying
group->kvm during kvm_vfio_destroy() is unnecessary (but again, I doubt this is
the case).

The two things aren't directly related, but it seems possible to solve both while
making this all slightly less ugly.  Well, at least from KVM's perspective, whether
or not it'd be an improvement on the VFIO side is definitely debatable.

> Specifically it would be very hard to avoid holding the group_lock
> during close which is when the put is done.
> 
> > Rather than force devices to get KVM references, why not handle that in common
> > VFIO code and drop KVM refcountin from devices?  Worst case scenario KVM is pinned
> > by a device that doesn't need KVM but is in a group associated with KVM.  If that's
> > a concern, it seems easy enough to add a flag to vfio_device_ops to enumerate
> > whether or not the device depends on KVM.
> 
> We can't make cross-dependencies between kvm and core VFIO - it is why
> so much of this is soo ugly.

Ugh, right, modules for everyone.

> The few device drivers that unavoidably have KVM involvment already
> have a KVM module dependency, so they can safely do the get/put

Rather than store a "struct kvm *" in vfio_device, what about adding a new set
of optional ops to get/put KVM references?  Having dedicated KVM ops is gross,
but IMO it's less gross than backdooring the KVM pointer into open_device() by
stashing KVM into the device, e.g. it formalizes the VFIO API for devices that
depend on KVM instead of making devices pinky-swear to grab a reference during
open_device().

To further harden things, KVM could export only kvm_get_safe_kvm() if there are
no vendor modules.  I.e. make kvm_get_kvm() an internal-only helper when possible
and effectively force VFIO devices to use the safe variant.  That would work even
x86, as kvm_get_kvm() wouldn't be exported if neither KVM_AMD nor KVM_INTEL is
built as a module.

---
 drivers/vfio/vfio_main.c | 20 +++++++++++++-------
 include/linux/vfio.h     |  9 +++++++--
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6e8804fe0095..b3a84d65baa6 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -772,7 +772,12 @@ static struct file *vfio_device_open(struct vfio_device *device)
 		 * reference and release it during close_device.
 		 */
 		mutex_lock(&device->group->group_lock);
-		device->kvm = device->group->kvm;
+
+		if (device->kvm_ops && device->group->kvm) {
+			ret = device->kvm_ops->get_kvm(device->group->kvm);
+			if (ret)
+				goto err_undo_count;
+		}
 
 		if (device->ops->open_device) {
 			ret = device->ops->open_device(device);
@@ -823,8 +828,9 @@ static struct file *vfio_device_open(struct vfio_device *device)
 err_undo_count:
 	mutex_unlock(&device->group->group_lock);
 	device->open_count--;
-	if (device->open_count == 0 && device->kvm)
-		device->kvm = NULL;
+	if (device->open_count == 0 && device->kvm_ops)
+		device->kvm_ops->put_kvm();
+
 	mutex_unlock(&device->dev_set->lock);
 	module_put(device->dev->driver->owner);
 err_unassign_container:
@@ -1039,8 +1045,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	}
 	mutex_unlock(&device->group->group_lock);
 	device->open_count--;
-	if (device->open_count == 0)
-		device->kvm = NULL;
+	if (device->open_count == 0 && device->kvm_ops)
+		device->kvm_ops->put_kvm();
 	mutex_unlock(&device->dev_set->lock);
 
 	module_put(device->dev->driver->owner);
@@ -1656,8 +1662,8 @@ EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
  * @file: VFIO group file
  * @kvm: KVM to link
  *
- * When a VFIO device is first opened the KVM will be available in
- * device->kvm if one was associated with the group.
+ * When a VFIO device is first opened, the device's kvm_ops->get_kvm() will be
+ * invoked with the KVM instance associated with the group (if applicable).
  */
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 {
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index fdd393f70b19..d6dcbe0546bf 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -18,6 +18,11 @@
 
 struct kvm;
 
+struct vfio_device_kvm_ops {
+	int (*get_kvm)(struct kvm *kvm);
+	void (*put_kvm)(void);
+};
+
 /*
  * VFIO devices can be placed in a set, this allows all devices to share this
  * structure and the VFIO core will provide a lock that is held around
@@ -43,8 +48,8 @@ struct vfio_device {
 	struct vfio_device_set *dev_set;
 	struct list_head dev_set_list;
 	unsigned int migration_flags;
-	/* Driver must reference the kvm during open_device or never touch it */
-	struct kvm *kvm;
+
+	const struct vfio_device_kvm_ops *kvm_ops;
 
 	/* Members below here are private, not for driver use */
 	unsigned int index;

base-commit: d52444c7a90fc551b4c3b0bda7d3f0b2ca9fc84d
--