Jason Gunthorpe wrote: > On Mon, Jul 15, 2024 at 04:37:01PM -0700, Dan Williams wrote: > > > So from a Linux VM perspective we have a PCI device with an IOMMU, > > > except that IOMMU flips into IDENTITY if T=0 is used. > > > > > > From a driver model and DMA API this is totally nutzo :) > > > > > > Being able to flip from trusted/untrusted and keep IOMMU/DMA/etc > > > unaffected requires that the vIOMMU can always walk the same IO page > > > tables stored in trusted VM memory, regardless if the device sends a > > > T=0/1 TLP. > > > > "Keep IOMMU/DMA/etc unaffected" is the hard part. > > Yes, but that is not just "unaffected" but it is implying that there > is state in the VM's iommu layer too. If T=0 goes to a different > translation then the DMA API must change behavior while a driver is > bound, which is not something we do today. > > > Implementations that want something more complicated than that, like > > interleave T=0 and T=1 traffic, need to demonstrate how that is possible > > given the iommufd maintainer declares it, *checks notes*, "totally > > nutzo". > > Oh we can make the iommufd side work out, it is the VM's kernel that > is going to be trouble :) > > Even in the simpler case of no-interleave but the same driver will > start with T=0 and change to T=1 is pretty complex: > > dma_addr1 = dma_map() <== Must return a bypass address because T=0 > goto_t_1() <== Now dma_addr1 stops being usable > dma_addr2 = dma_map() <== Must return a translated address through the vIOMMU > dma_unmap(dma_addr1) <== Well now you've done it. Your kernel explodes. > > Maybe the "violance" is we have to unbind the PCI driver and rebind it > to get the goto_t_1() effect.. > > Changing the underlying behavior of the DMA API "in flight" while a > driver is bound seems really dangerous. Agree. > My point is if we start baking in the assumption that drivers can do > things like the above without addressing how the VIOMMU integration > works we are going to have a *huge mess* to try and introduce VIOMMU > down the road. > > I'd be happy if V1 forbade the above entirely. Yes, I think the requirement to go through rebind to cross the untrusted/trusted boundary gives enough simplification to get started. It also occurs to me that complex devices / drivers that really want mixed T=0 and T=1 traffic from one PF can ingest the complexity without burdening the Linux DMA API and IOMMU layers. Provide 2 assignable VFs instead of 1 and do software driver-to-driver communication between those trusted and untrusted drivers.