Re: [PATCH 4/4] iommu: Fix ordering of iommu_release_device()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022-09-08 22:43, Jason Gunthorpe wrote:
On Thu, Sep 08, 2022 at 10:27:06PM +0100, Robin Murphy wrote:

Oh, because s390 is using iommu_get_domain_for_dev() in its release_device
callback, which needs to dereference the group to work, and the current
domain may also be a non-default one which we can't prevent from
disappearing racily, that was why :(

Hum, the issue there is the use of device->iommu_group - but that just
means I didn't split properly. How about this incremental:

That did cross my mind, but it's a bit grim. In the light of the morning, I'm not sure s390 actually *needs* the group anyway - AFAICS if iommu_group_remove_device() has been processed first, that will have synchronised against any concurrent attach/detach, so zdev->s390_domain can be assumed to be up to date and used directly without the round trip through iommu_get_domain_for_dev(). That then only leaves the issue that that domain may still become invalid at any point after the group mutex has been dropped.

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c451bf715182ac..99ef799f3fe6b5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -351,6 +351,7 @@ void iommu_release_device(struct device *dev)
  	 * them until the have been detached. release_device() is expected to
  	 * detach all domains connected to the dev.
  	 */
+	dev->iommu_group = NULL;
  	kobject_put(group->devices_kobj);
module_put(ops->owner);
@@ -980,7 +981,6 @@ static void __iommu_group_remove_device(struct device *dev)
kfree(device->name);
  	kfree(device);
-	dev->iommu_group = NULL;
  }
/**
@@ -995,6 +995,7 @@ void iommu_group_remove_device(struct device *dev)
  	struct iommu_group *group = dev->iommu_group;
__iommu_group_remove_device(dev);
+	dev->iommu_group = NULL;
  	kobject_put(group->devices_kobj);
  }
  EXPORT_SYMBOL_GPL(iommu_group_remove_device);

To me it makes sense that the driver should be able to continue to
query the iommu_group during release anyhow..

I'm not so sure, release shouldn't be depending on a group since there may never have been one anyway. Perhaps the answer is an extra pre-release step to balance probe_finalize?

And to your other question, the reason I split the function is because
I couldn't really say WTF iommu_group_remove_device() was supposed to
do. The __ version make ssense as part of the remove_device, due to
the sequencing with ops->release()

But the other one doesn't have that. So I want to put in a:

    WARN_ON(group->blocking_domain || group->default_domain);

Because calling it after those domains are allocated looks broken to
me.

I might be misunderstanding, but that sounds backwards - if a real device is being hotplugged out, we absolutely expect that to happen *after* its default domain has been set up. The external callers are using fake groups where default domains aren't relevant, and I have no idea what PAMU is doing but it's been doing it for long enough that it most likely isn't a problem. Thus wherever that check would be it would seem either wrong or unnecessary.

Thanks,
Robin.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux