On Thu, Oct 14, 2021 at 09:11:58AM +0000, Tian, Kevin wrote: > But in both cases cache maintenance instructions are available from > guest p.o.v and no coherency semantics would be violated. You've described how Intel's solution papers over the problem. In part wbinvd is defined to restore CPU cache coherence after a no-snoop DMA. Having wbinvd NOP breaks this contract. To counter-act the broken wbinvd the IOMMU completely prevents the use of no-snoop DMA. It converts them to snoop instead. The driver thinks it has no-snoop. The platform appears to support no-snoop. The driver issues wbinvd - but all of it does nothing. Don't think any of this is even remotely related to what ARM is doing here. ARM has neither the broken VM cache ops, nor the IOMMU ability to suppress no-snoop. > > > I think the key is whether other archs allow driver to decide DMA > > > coherency and indirectly the underlying I/O page table format. > > > If yes, then I don't see a reason why such decision should not be > > > given to userspace for passthrough case. > > > > The choice all comes down to if the other arches have cache > > maintenance instructions in the VM that *don't work* > > Looks vfio always sets IOMMU_CACHE on all platforms as long as > iommu supports it (true on all platforms except intel iommu which > is dedicated for GPU): > > vfio_iommu_type1_attach_group() > { > ... > if (iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY)) > domain->prot |= IOMMU_CACHE; > ... > } > > Should above be set according to whether a device is coherent? For IOMMU_CACHE there are two questions related to the overloaded meaning: - Should VFIO ask the IOMMU to use non-coherent DMA (ARM meaning) This depends on how the VFIO user expects to operate the DMA. If the VFIO user can issue cache maintenance ops then IOMMU_CACHE should be controlled by the user. I have no idea what platforms support user space cache maintenance ops. - Should VFIO ask the IOMMU to suppress no-snoop (Intel meaning) This depends if the VFIO user has access to wbinvd or not. wbinvd is a privileged instruction so normally userspace will not be able to access it. Per Paolo recommendation there should be a uAPI someplace that allows userspace to issue wbinvd - basically the suppress no-snoop is also user controllable. The two things are very similar and ultimately are a choice userspace should be making. >From something like a qemu perspective things are more murkey - eg on ARM qemu needs to co-ordinate with the guest. Whatever IOMMU_CACHE mode VFIO is using must match the device coherent flag in the Linux guest. I'm guessing all Linux guest VMs only use coherent DMA for all devices today. I don't know if the cache maintaince ops are even permitted in an ARM VM. Jason