Hi Jason, On 11/28/22 01:14, Jason Gunthorpe wrote: > On Sun, Nov 27, 2022 at 10:13:55PM +0100, Eric Auger wrote: >>> +static int iommufd_device_setup_msi(struct iommufd_device *idev, >>> + struct iommufd_hw_pagetable *hwpt, >>> + phys_addr_t sw_msi_start, >>> + unsigned int flags) >>> +{ >>> + int rc; >>> + >>> + /* >>> + * IOMMU_CAP_INTR_REMAP means that the platform is isolating MSI, and it >> rather means that the *IOMMU* implements IRQ remapping. > Not really. The name is a mess, but as it is implemented, it means the > platform is implementing MSI security. How exactly that is done is not > really defined, and it doesn't really belong as an iommu property. > However the security is being created is done in a way that is > transparent to the iommu_domain user. Some 'ARM platforms' implement what you call MSI security but they do not advertise IOMMU_CAP_INTR_REMAP Besides refering to include/linux/iommu.h: IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */ > > MSI security means that the originating device (eg the RID for PCI) is > validated when the MSI is processed. RIDs can only use MSIs they are > authorized to use. agreed this is what I described below. > > It doesn't matter how it is done, if it remapping HW, fancy > iommu_domain tricks, or built into the MSI controller. Set this flag > if the platform is secure and doesn't need the code triggered by > irq_domain_check_msi_remap(). this is not what is implemented as of now. If the IOMMU does support interrupt isolation, it advertises IOMMU_CAP_INTR_REMAP. On ARM this feature is implemented by the ITS MSI controller instead and the only way to retrieve the info whether the device MSIs are directed to that kind of MSI controller is to use irq_domain_check_msi_remap(). > >> irq_domain_check_msi_remap() instead means the MSI controller >> implements that functionality (a given device id is able to trigger > Not quite, it means that MSI isolation is available, however it is not > transparent and the iommu_domain user must do the little dance that > follows. No I do not agree on that point. The 'little dance' is needed because the SMMU does not bypass MSI writes as done on Intel. And someone must take care of the MSI transaction mapping. This is the role of the MSI cookie stuff. To me this is independent on the above discussion whether MSI isolation is implemented. > > It does not mean the MSI controller implements the security > functionality because it is not set on x86. > > Logically on x86 we should consider the remapping logic as part of the > MSI controller even if x86 makes them separated for historical > reasons. > >> MSI #n and this #n gets translated into actual MSI #m) So what you >> want is to prevent an assigned device from being able to DMA into an >> MSI doorbell that is not protected by either the IOMMU or the MSI >> controller. If this happens this means the DMA can generate any kind >> of MSI traffic that can jeopardize the host or other VMs > I don't know of any real systems that work like this. ARM standard GIC > uses a shared ITS page, the IOMMU does nothing to provide MSI > security. MSI security is built into the GIC because it validates the > device ID as part of processing the MSI. The IOMMU is only involved > to grant access to the shared ITS page. ?? Yeah that's what I said. The SMMU does nothing about MSI security. The ITS does. > > Intel is similar, it provides security through the MSI controller's > remapping logic, the only difference is that on Intel the MSI window > is always present in every iommu_domain (becuase it occures before the > IOMMU), and in ARM you have to do the little dance. On Intel the MSI window [FEE0_0000h - FEF0_000h] is bypassed by the IOMMU. On ARM MSI transactions are translated except in case of HW MSI RESV regions (I think). > > Even if the iommu is to be involved it is all hidden from this layer. > >> and afterwards resv_msi is checked to see if we need to create the >> so-called msi cookie. This msi cookie tells the MSI writes are >> translated by the IOMMU and somebody must create a mapping for those >> transactions. > The cookie is always necessary if we are going to use the > non-transparent mode. That is what makes it the non transparent > mode. We have to supply the reserved IOVA range from one part of the > iommu driver to another part. > >>> + * creates the MSI window by default in the iommu domain. Nothing >>> + * further to do. >>> + */ >>> + if (device_iommu_capable(idev->dev, IOMMU_CAP_INTR_REMAP)) >>> + return 0; >>> + >>> + /* >>> + * On ARM systems that set the global IRQ_DOMAIN_FLAG_MSI_REMAP every >>> + * allocated iommu_domain will block interrupts by default and this >> It sounds there is a confusion between IRQ remapping and the fact MSI >> writes are not bypassed by the IOMMU. > I don't think I'm confused :) As soon as there is an SW MSI_RESV region and only in that case you need to put in place the msi cookie (because it indicates the IOMMU translates MSI transactions). The fact the platform provides MSI security (through IOMMU or MSI controller) looks orthogonal to me. > >>> +static int iommufd_device_do_attach(struct iommufd_device *idev, >>> + struct iommufd_hw_pagetable *hwpt, >>> + unsigned int flags) >>> +{ >>> + phys_addr_t sw_msi_start = 0; >>> + int rc; >>> + >>> + mutex_lock(&hwpt->devices_lock); >>> + >>> + /* >>> + * Try to upgrade the domain we have, it is an iommu driver bug to >>> + * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail >>> + * enforce_cache_coherency when there are no devices attached to the >>> + * domain. >>> + */ >>> + if (idev->enforce_cache_coherency && !hwpt->enforce_cache_coherency) { >>> + if (hwpt->domain->ops->enforce_cache_coherency) >>> + hwpt->enforce_cache_coherency = >>> + hwpt->domain->ops->enforce_cache_coherency( >>> + hwpt->domain); >>> + if (!hwpt->enforce_cache_coherency) { >>> + WARN_ON(list_empty(&hwpt->devices)); >>> + rc = -EINVAL; >>> + goto out_unlock; >>> + } >>> + } >>> + >>> + rc = iopt_table_enforce_group_resv_regions(&hwpt->ioas->iopt, idev->dev, >>> + idev->group, &sw_msi_start); >>> + if (rc) >>> + goto out_unlock; >>> + >> so in the case of any IOMMU_RESV_MSI, iommufd_device_setup_msi() will be >> called with *sw_msi_start = 0 which will return -EPERM? >> I don't think this is what we want. In that case I think we want the >> RESV_MSI region to be taken into account as a RESV region but we don't >> need the MSI cookie. > This was sort of sloppy copied from VFIO - we should just delete > it. The is no driver that sets both, and once the platform asserts > irq_domain_check_msi_remap() it is going down the non-transparent path > anyhow and must set a cookie to work. [again the names doesn't make > any sense for the functionality] > > Failing with EPERM is probably not so bad since the platform is using > an invalid configuration. I'm kind of inclined to leave this for right I don't understand why it is invalid? HW MSI RESV region is a valid config and not sure you tested with that kind of setup, did you? > now since it has all be tested and I don't want to make a > regression. But I can try to send a patch to tidy it a bit more, maybe > add an appropriate WARN_ON. > > The whole thing is actually backwards. The IOMMU_CAP_INTR_REMAP should > have been some global irq_has_secure_msi() and everything with > interrupt remapping, and ARM should set it. You are revisiting this IOMMU_CAP_INTR_REMAP definition ... this is not what is documented in the header file. > > Then the domain should have had a domain cap > IOMUU_CAP_REQUIRE_MSI_WINDOW_SETUP to do the little dance ARM drivers > need. > > And even better would have been to not have the little dance in the > first place, as we don't really need the iommu_domain user to convey > the fixed MSI window from one part of the iommu driver to another. > > We may try to fix this up when we get to doing nesting on ARM, or > maybe just leave the confusing sort of working thing as-is. I don't > know. > > Jason > Eric