On 2022-03-17 08:08, Mika Westerberg wrote:
Hi Robin,
On Wed, Mar 16, 2022 at 07:17:57PM +0000, Robin Murphy wrote:
The feeling I'm getting from all this is that if we've got as far as
iommu_dma_protection_show() then it's really too late to meaningfully
mitigate bad firmware.
Note, these are requirements from Microsoft in order for the system to
use the "Kernel DMA protection". Because of this, likelyhood of "bad
firmware" should be quite low since these systems ship with Windows
installed so they should get at least some soft of validation that this
actually works.
We should be able to detect missing
untrusted/external-facing properties as early as nhi_probe(), and if we
could go into "continue at your own risk" mode right then *before* anything
else happens, it all becomes a lot easier to reason about.
I think what we want is that the DMAR opt-in bit is set in the ACPI
tables and that we know the full IOMMU translation is happening for the
devices behind "external facing ports". If that's not the case the
iommu_dma_protection_show() should return 0 meaning the userspace can
ask the user whether the connected device is allowed to use DMA (e.g
PCIe is tunneled or not).
Ah, if it's safe to just say "no protection" in the case that we don't
know for sure, that's even better. Clearly I hadn't quite grasped that
aspect of the usage model, thanks for the nudge!
We do check for the DMAR bit in the Intel IOMMU code and we also do
check that there actually are PCIe ports marked external facing but we
could issue warning there if that's not the case. Similarly if the user
explicitly disabled the IOMMU translation. This can be done inside a new
IOMMU API that does something like the below pseudo-code:
#if IOMMU_ENABLED
bool iommu_dma_protected(struct device *dev)
{
if (dmar_platform_optin() /* or the AMD equivalent */) {
if (!iommu_present(...)) /* whatever is needed to check that the full translation is enabled */
dev_warn(dev, "IOMMU protection disabled!");
/*
* Look for the external facing ports. Should be at
* least 1 or issue warning.
*/
...
return true;
}
return false;
}
#else
static inline bool iommu_dma_protected(struct device *dev)
{
return false;
}
#endif
Then we can make iommu_dma_protection_show() to call this function.
The problem that I've been trying to nail down here is that
dmar_platform_optin() really doesn't mean much for us - I don't know how
Windows' IOMMU drivers work, but there's every chance it's not the
same way as ours. The only material effect that dmar_platform_optin()
has for us is to prevent the user from disabling the IOMMU driver
altogether, and thus ensure that iommu_present() is true. Whether or not
we can actually trust the IOMMU driver to provide reliable protection
depends entirely on whether it knows the PCIe ports are external-facing.
If not, we can only *definitely* know what the IOMMU driver will do for
a given endpoint once that endpoint has appeared behind the port and
iommu_probe_device() has decided what its default domain should be, and
as far as I now understand, that's not an option for Thunderbolt since
it can only happen *after* the tunnel has been authorised and created.
Much as I'm tempted to de-scope back to my IOMMU API cleanup and run
away from the rest of the issue, I think I can crib enough from the
existing code to attempt a reasonable complete fix, so let me give that
a go...
Thanks,
Robin.