Hi Robin, On Thu, Mar 17, 2022 at 01:42:56PM +0000, Robin Murphy wrote: > On 2022-03-17 08:08, Mika Westerberg wrote: > > Hi Robin, > > > > On Wed, Mar 16, 2022 at 07:17:57PM +0000, Robin Murphy wrote: > > > The feeling I'm getting from all this is that if we've got as far as > > > iommu_dma_protection_show() then it's really too late to meaningfully > > > mitigate bad firmware. > > > > Note, these are requirements from Microsoft in order for the system to > > use the "Kernel DMA protection". Because of this, likelyhood of "bad > > firmware" should be quite low since these systems ship with Windows > > installed so they should get at least some soft of validation that this > > actually works. > > > > > We should be able to detect missing > > > untrusted/external-facing properties as early as nhi_probe(), and if we > > > could go into "continue at your own risk" mode right then *before* anything > > > else happens, it all becomes a lot easier to reason about. > > > > I think what we want is that the DMAR opt-in bit is set in the ACPI > > tables and that we know the full IOMMU translation is happening for the > > devices behind "external facing ports". If that's not the case the > > iommu_dma_protection_show() should return 0 meaning the userspace can > > ask the user whether the connected device is allowed to use DMA (e.g > > PCIe is tunneled or not). > > Ah, if it's safe to just say "no protection" in the case that we don't know > for sure, that's even better. Clearly I hadn't quite grasped that aspect of > the usage model, thanks for the nudge! There is some documentation here too, hope it is helpful: https://docs.kernel.org/admin-guide/thunderbolt.html > > We do check for the DMAR bit in the Intel IOMMU code and we also do > > check that there actually are PCIe ports marked external facing but we > > could issue warning there if that's not the case. Similarly if the user > > explicitly disabled the IOMMU translation. This can be done inside a new > > IOMMU API that does something like the below pseudo-code: > > > > #if IOMMU_ENABLED > > bool iommu_dma_protected(struct device *dev) > > { > > if (dmar_platform_optin() /* or the AMD equivalent */) { > > if (!iommu_present(...)) /* whatever is needed to check that the full translation is enabled */ > > dev_warn(dev, "IOMMU protection disabled!"); > > /* > > * Look for the external facing ports. Should be at > > * least 1 or issue warning. > > */ > > ... > > > > return true; > > } > > > > return false; > > } > > #else > > static inline bool iommu_dma_protected(struct device *dev) > > { > > return false; > > } > > #endif > > > > Then we can make iommu_dma_protection_show() to call this function. > > The problem that I've been trying to nail down here is that > dmar_platform_optin() really doesn't mean much for us - I don't know how > Windows' IOMMU drivers work, but there's every chance it's not the same way > as ours. The only material effect that dmar_platform_optin() has for us is > to prevent the user from disabling the IOMMU driver altogether, and thus > ensure that iommu_present() is true. Whether or not we can actually trust > the IOMMU driver to provide reliable protection depends entirely on whether > it knows the PCIe ports are external-facing. If not, we can only > *definitely* know what the IOMMU driver will do for a given endpoint once > that endpoint has appeared behind the port and iommu_probe_device() has > decided what its default domain should be, and as far as I now understand, > that's not an option for Thunderbolt since it can only happen *after* the > tunnel has been authorised and created. That's correct. We do know the PCIe root/downstream ports (the external facing ones) that host the tunneled PCIe topology but rest will appear dynamically after the connection manager established the protocol tunnel. > Much as I'm tempted to de-scope back to my IOMMU API cleanup and run away > from the rest of the issue, I think I can crib enough from the existing code > to attempt a reasonable complete fix, so let me give that a go... Sure ;-)