Re: [PATCH 4/4] iommu: Fix ordering of iommu_release_device()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022-09-10 00:45, Jason Gunthorpe wrote:
On Fri, Sep 09, 2022 at 08:55:07PM +0100, Robin Murphy wrote:

Isn't this every driver though? Like every single driver implementing
an UNMANAGED/DMA/DMA_FQ domain has a hidden reference to the
iommu_domain - minimally to point the HW to the IOPTEs it stores.

Um, no? Domain ops get the domain passed in as an argument, which is far
from hidden, and if any driver implemented them to ignore that argument and
operate on something else it would be stupid and broken. Note I said
"per-device reference", meaning things like s390's zpci_dev->s390_domain and
SMMUv3's dev->iommu->priv->domain. It's only those references that are
reachable from release_device - outside the normal domain lifecycle - which
are problematic.

If the plan is to make the domain refcounted and then allow a 'put' on
it before we reach release_device() then it means every driver needs
to hold a 'get' on the domain while it is programmed into the HW.

Because the hw will still be touching memory that could be freed by an
iommu_domain_free(). By "hidden" reference I mean the HW walkers are
touching memory that would be freed - ie kasn won't see it.

As far as I'm concerned we're dealing purely with the case where release_device races with attaching back to the default domain *and* the driver has some reason for release_device to poke at what it thinks the currently-attached domain is. Anyone who frees a domain while it's still actually live deserves whatever they get; it would be thoroughly impractical to attempt to mitigate for that kind of silliness.

Do you know a reason not to hold the group mutex across
release_device? I think that is the most straightforward and
future proof.

Yes, the ones documented in the code and already discussed here. The current
functional ones aren't particularly *good* reasons, but unless and until
they can all be cleaned up they are what they are.

Uh, I feel like I missed part of the conversation - I don't know what
this list is..

s390 (remember how we got here?) calls iommu_get_domain_for_dev(). ipmmu-vmsa calls arm_iommu_detach_device() (mtk_v1 doesn't, but perhaps technically should), to undo the corresponding attach from probe_finalize - apologies for misremembering which way round the comments were.

@@ -1022,6 +1030,14 @@ void iommu_group_remove_device(struct device *dev)
   	dev_info(dev, "Removing from iommu group %d\n", group->id);
   	mutex_lock(&group->mutex);
+	if (WARN_ON(group->domain != group->default_domain &&
+		    group->domain != group->blocking_domain)) {

This will false trigger, if there are two VFIO devices then the group
will remained owned when we unplug one just of them, but the group's domain
will be a VFIO owned domain.

As opposed to currently, where most drivers' release_device will blindly
detach/disable the RID in some fashion so the other device would suddenly
blow up anyway?

Er, I think it is OK today, in the non-shared case. If the RID isn't
shared then each device in the group is independent, so most drivers,
most of the time, should only effect the RID release_device() is
called on, while this warning will always trigger for any multi-device
group.

Oh, apparently I managed to misinterpret this as the two *aliasing* devices case, sorry. Indeed it is overly conservative for that. I think the robust way to detect bad usage is actually not via the group at all, but for iommu_device_{un}use_default_domain() to also maintain a per-device ownership flag, then we warn if a device is released with that still set.

(It *will* actually work on SMMUv2 because SMMUv2 comprehensively handles
StreamID-level aliasing beyond what pci_device_group() covers, which I
remain rather proud of)

This is why I prefered not to explicitly change the domain, because at
least if someone did write a non-buggy driver it doesn't get wrecked -
and making a non-buggy driver is at least allowed by the API.

Detaching back to the default domain still seems like it's *always* the right thing to do at this point, even when it should not be warned about as above. As I say it *does* work on non-buggy drivers, and making this whole domain use-after-free race a fundamental non-issue is attractive.

And it misses the logic to WARN_ON if a domain is set and an external
entity wrongly uses iommu_group_remove_device()..

Huh? An external fake group couldn't have a default domain or blocking
domain, thus any non-NULL domain will not compare equal to either, so if
that could happen it will warn, and then most likely crash. I did think
briefly about trying to make it not crash, but then I remembered that fake
groups from external callers also aren't backed by IOMMU API drivers so have
no way to allocate or attach domains either, so in fact it cannot happen at
all under any circumstances that are worth reasoning about.

I mean specificaly thing like FSL is doing where it is a real driver
calling this API and the test of 'group->domain == NULL' is the more
robust precondition.

Having looked a bit closer, I think I get what PAMU is doing - kind of impedance-matching between pci_device_group() and its own non-ACS form of isolation, and possibly also a rather roundabout way of propagating DT data from the PCI controller node up into the PCI hierarchy. Both could quite likely be done in a more straightforward manner these days (and TBH I'm not convinced it works at all since it doesn't appear to match the actual DT binding), but either way I'm fairly confident we needn't worry about it.

Thanks,
Robin.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux