On Mon, Sep 12, 2022 at 12:13:25PM +0100, Robin Murphy wrote: > > > Yes, the ones documented in the code and already discussed here. The current > > > functional ones aren't particularly *good* reasons, but unless and until > > > they can all be cleaned up they are what they are. > > > > Uh, I feel like I missed part of the conversation - I don't know what > > this list is.. > > s390 (remember how we got here?) calls iommu_get_domain_for_dev(). iommu_get_domain_for_dev() doesn't take a lock, and the last try I showed (see below) ordered things so that it would succeed when called under release(). AFIACT it is already not a problem. > ipmmu-vmsa calls arm_iommu_detach_device() (mtk_v1 doesn't, but perhaps > technically should), to undo the corresponding attach from probe_finalize - > apologies for misremembering which way round the comments were. This does eventually deadlock on the group->mutex, so yes this is a problem! And I agree mtk_v1 does looks like it has a memory leak. But, this looks easy enough to fix up. See below > Oh, apparently I managed to misinterpret this as the two *aliasing* devices > case, sorry. Indeed it is overly conservative for that. I think the robust > way to detect bad usage is actually not via the group at all, but for > iommu_device_{un}use_default_domain() to also maintain a per-device > ownership flag, then we warn if a device is released with that still set. It seems a bit hard to implement a per-device flag with VFIO? > > > (It *will* actually work on SMMUv2 because SMMUv2 comprehensively handles > > > StreamID-level aliasing beyond what pci_device_group() covers, which I > > > remain rather proud of) > > > > This is why I prefered not to explicitly change the domain, because at > > least if someone did write a non-buggy driver it doesn't get wrecked - > > and making a non-buggy driver is at least allowed by the API. > > Detaching back to the default domain still seems like it's *always* the > right thing to do at this point, If the RID is aliased it is the wrong thing to do. Calling attach will wreck the entire alias set. release is not supposed to do that, buggy drivers excepted. > Having looked a bit closer, I think I get what PAMU is doing - kind of > impedance-matching between pci_device_group() and its own non-ACS form of > isolation, and possibly also a rather roundabout way of propagating DT data > from the PCI controller node up into the PCI hierarchy. Both could quite > likely be done in a more straightforward manner these days (and TBH I'm not > convinced it works at all since it doesn't appear to match the actual DT > binding), but either way I'm fairly confident we needn't worry about it. Yes, this is what I thought too. Arguably it is wrong to try and rework the groups once created, it should be creating the groups to cover what it wants from the start, and it shouldn't leave the controller without a group. So something like the below is what I'm thinking Thanks, Jason diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h index fe9ef6f79e9cfe..ea7198a1786186 100644 --- a/arch/arm/include/asm/dma-iommu.h +++ b/arch/arm/include/asm/dma-iommu.h @@ -31,6 +31,7 @@ void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping); int arm_iommu_attach_device(struct device *dev, struct dma_iommu_mapping *mapping); void arm_iommu_detach_device(struct device *dev); +void arm_iommu_release_device(struct device *dev); #endif /* __KERNEL__ */ #endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 089c9c644cce2a..1694bebb3aa4dc 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1697,13 +1697,15 @@ int arm_iommu_attach_device(struct device *dev, EXPORT_SYMBOL_GPL(arm_iommu_attach_device); /** - * arm_iommu_detach_device + * arm_iommu_release_device * @dev: valid struct device pointer * - * Detaches the provided device from a previously attached map. - * This overwrites the dma_ops pointer with appropriate non-IOMMU ops. + * This is like arm_iommu_detach_device() except it handles the special + * case of being called under an iommu driver's release operation. In this + * case the driver must have already detached the device from any attached + * domain before calling this function. */ -void arm_iommu_detach_device(struct device *dev) +void arm_iommu_release_device(struct device *dev) { struct dma_iommu_mapping *mapping; @@ -1713,13 +1715,34 @@ void arm_iommu_detach_device(struct device *dev) return; } - iommu_detach_device(mapping->domain, dev); kref_put(&mapping->kref, release_iommu_mapping); to_dma_iommu_mapping(dev) = NULL; set_dma_ops(dev, NULL); pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev)); } +EXPORT_SYMBOL_GPL(arm_iommu_release_device); + +/** + * arm_iommu_detach_device + * @dev: valid struct device pointer + * + * Detaches the provided device from a previously attached map. + * This overwrites the dma_ops pointer with appropriate non-IOMMU ops. + */ +void arm_iommu_detach_device(struct device *dev) +{ + struct dma_iommu_mapping *mapping; + + mapping = to_dma_iommu_mapping(dev); + if (!mapping) { + dev_warn(dev, "Not attached\n"); + return; + } + + iommu_detach_device(mapping->domain, dev); + arm_iommu_release_device(dev); +} EXPORT_SYMBOL_GPL(arm_iommu_detach_device); static void arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 780fb70715770d..61444b2a11e217 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -90,6 +90,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group, static struct iommu_group *iommu_group_get_for_dev(struct device *dev); static ssize_t iommu_group_store_type(struct iommu_group *group, const char *buf, size_t count); +static struct group_device * +__iommu_group_remove_device(struct iommu_group *group, struct device *dev); +static void __iommu_group_remove_device_sysfs(struct iommu_group *group, + struct group_device *device); #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ struct iommu_group_attribute iommu_group_attr_##_name = \ @@ -330,18 +334,50 @@ int iommu_probe_device(struct device *dev) void iommu_release_device(struct device *dev) { + struct iommu_group *group = dev->iommu_group; const struct iommu_ops *ops; + struct group_device *device; if (!dev->iommu) return; iommu_device_unlink(dev->iommu->iommu_dev, dev); + mutex_lock(&group->mutex); + device = __iommu_group_remove_device(group, dev); ops = dev_iommu_ops(dev); + + /* + * If the group has become empty then ownership must have been released, + * and the current domain must be set back to NULL or the default + * domain. + */ + if (list_empty(&group->devices)) + WARN_ON(group->owner_cnt || + group->domain != group->default_domain); + + /* + * release_device() must stop using any attached domain on the device. + * If there are still other devices in the group they are not effected + * by this callback. + * + * The IOMMU driver must set the device to either an identity or + * blocking translation and stop using any domain pointer, as it is + * going to be freed. + */ if (ops->release_device) ops->release_device(dev); + mutex_unlock(&group->mutex); + + __iommu_group_remove_device_sysfs(group, device); + + /* + * This will eventually call iommu_group_release() which will free the + * iommu_domains. + */ + dev->iommu_group = NULL; + kobject_put(group->devices_kobj); - iommu_group_remove_device(dev); module_put(ops->owner); dev_iommu_free(dev); } @@ -939,44 +975,71 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_group_add_device); -/** - * iommu_group_remove_device - remove a device from it's current group - * @dev: device to be removed - * - * This function is called by an iommu driver to remove the device from - * it's current group. This decrements the iommu group reference count. - */ -void iommu_group_remove_device(struct device *dev) +static struct group_device * +__iommu_group_remove_device(struct iommu_group *group, struct device *dev) { - struct iommu_group *group = dev->iommu_group; - struct group_device *tmp_device, *device = NULL; + struct group_device *device; + + lockdep_assert_held(&group->mutex); if (!group) - return; + return NULL; dev_info(dev, "Removing from iommu group %d\n", group->id); - mutex_lock(&group->mutex); - list_for_each_entry(tmp_device, &group->devices, list) { - if (tmp_device->dev == dev) { - device = tmp_device; + list_for_each_entry(device, &group->devices, list) { + if (device->dev == dev) { list_del(&device->list); - break; + return device; } } - mutex_unlock(&group->mutex); + return NULL; +} +static void __iommu_group_remove_device_sysfs(struct iommu_group *group, + struct group_device *device) +{ if (!device) return; sysfs_remove_link(group->devices_kobj, device->name); - sysfs_remove_link(&dev->kobj, "iommu_group"); + sysfs_remove_link(&device->dev->kobj, "iommu_group"); - trace_remove_device_from_group(group->id, dev); + trace_remove_device_from_group(group->id, device->dev); kfree(device->name); kfree(device); - dev->iommu_group = NULL; +} + +/** + * iommu_group_remove_device - remove a device from it's current group + * @dev: device to be removed + * + * This function is used by non-iommu drivers to create non-iommu subystem + * groups (eg VFIO mdev, SPAPR) Using this from inside an iommu driver is an + * abuse. Instead the driver should return the correct group from + * ops->device_group() + */ +void iommu_group_remove_device(struct device *dev) +{ + struct iommu_group *group = dev->iommu_group; + struct group_device *device; + + /* + * Since we don't do ops->release_device() there is no way for the + * driver to stop using any attached domain before we free it. If a + * domain is attached while this function is called it will eventually + * UAF. + * + * Thus it is only useful for cases like VFIO/SPAPR that don't use an + * iommu driver, or for cases like FSL that don't use default domains. + */ + WARN_ON(group->domain); + + mutex_lock(&group->mutex); + device = __iommu_group_remove_device(group, dev); + mutex_unlock(&group->mutex); + __iommu_group_remove_device_sysfs(group, device); kobject_put(group->devices_kobj); } EXPORT_SYMBOL_GPL(iommu_group_remove_device); diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 1d42084d02767e..f5b9787d22420c 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -302,11 +302,8 @@ static void ipmmu_utlb_enable(struct ipmmu_vmsa_domain *domain, /* * Disable MMU translation for the microTLB. */ -static void ipmmu_utlb_disable(struct ipmmu_vmsa_domain *domain, - unsigned int utlb) +static void ipmmu_utlb_disable(struct ipmmu_vmsa_device *mmu, unsigned int utlb) { - struct ipmmu_vmsa_device *mmu = domain->mmu; - ipmmu_imuctr_write(mmu, utlb, 0); mmu->utlb_ctx[utlb] = IPMMU_CTX_INVALID; } @@ -649,11 +646,11 @@ static void ipmmu_detach_device(struct iommu_domain *io_domain, struct device *dev) { struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); - struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain); + struct ipmmu_vmsa_device *mmu = to_ipmmu(dev); unsigned int i; for (i = 0; i < fwspec->num_ids; ++i) - ipmmu_utlb_disable(domain, fwspec->ids[i]); + ipmmu_utlb_disable(mmu, fwspec->ids[i]); /* * TODO: Optimize by disabling the context when no device is attached. @@ -849,7 +846,8 @@ static void ipmmu_probe_finalize(struct device *dev) static void ipmmu_release_device(struct device *dev) { - arm_iommu_detach_device(dev); + ipmmu_detach_device(NULL, dev); + arm_iommu_release_device(dev); } static struct iommu_group *ipmmu_find_group(struct device *dev)