On 16/7/24 09:55, Jason Gunthorpe wrote:
On Mon, Jul 15, 2024 at 04:37:01PM -0700, Dan Williams wrote:
So from a Linux VM perspective we have a PCI device with an IOMMU,
except that IOMMU flips into IDENTITY if T=0 is used.
From a driver model and DMA API this is totally nutzo :)
Being able to flip from trusted/untrusted and keep IOMMU/DMA/etc
unaffected requires that the vIOMMU can always walk the same IO page
tables stored in trusted VM memory, regardless if the device sends a
T=0/1 TLP.
"Keep IOMMU/DMA/etc unaffected" is the hard part.
Yes, but that is not just "unaffected" but it is implying that there
is state in the VM's iommu layer too. If T=0 goes to a different
translation then the DMA API must change behavior while a driver is
bound, which is not something we do today.
Implementations that want something more complicated than that, like
interleave T=0 and T=1 traffic, need to demonstrate how that is possible
given the iommufd maintainer declares it, *checks notes*, "totally
nutzo".
Oh we can make the iommufd side work out, it is the VM's kernel that
is going to be trouble :)
Even in the simpler case of no-interleave but the same driver will
start with T=0 and change to T=1 is pretty complex:
dma_addr1 = dma_map() <== Must return a bypass address because T=0
goto_t_1() <== Now dma_addr1 stops being usable
dma_addr2 = dma_map() <== Must return a translated address through the vIOMMU
dma_unmap(dma_addr1) <== Well now you've done it. Your kernel explodes.
Maybe the "violance" is we have to unbind the PCI driver and rebind it
to get the goto_t_1() effect..
(uff, quite a thread, I am catching up)
Why flipping?
If there is vIOMMU, then the driver in the VM can decide whether it
wants private or shared memory for DMA, pass that new flag to dma_map()
and 1) have DMA memory allocated from the private pool (== no page state
changes) and 2) have C-bit set in the vIOMMU page table (which is in the
VM memory).
It is without vIOMMU when flipping is sort of a problem but the driver
in the VM can decide on type of DMA, talk to the TSM and only then
enable DMA (==bus master) but by then the things in the HV are settled
so we are ok.
Talking to the TSM does not really require DMA but even if it did, we
could enable untrusted DMA, do this attestation step, then disable DMA,
tell the HV/TSM to switch DMA to secure and enable DMA, all in the
driver's probe().
Changing the underlying behavior of the DMA API "in flight" while a
driver is bound seems really dangerous.
Hard to imagine why would a driver want this :)
My point is if we start baking in the assumption that drivers can do
things like the above without addressing how the VIOMMU integration
works we are going to have a *huge mess* to try and introduce VIOMMU
down the road.
I'd be happy if V1 forbade the above entirely.
My V1 says "all IOVA below X are private and above - shared" (which is a
hw knob in absence of vIOMMU) and I set the X to all '1's just to mark
it all private.
Jason
--
Alexey