On Wed, Apr 06, 2022 at 06:52:04AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Wednesday, April 6, 2022 12:16 AM > > > > PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented > > by a platform as bypassing elements in the DMA coherent CPU cache > > hierarchy. A driver can command a device to set this bit on some of its > > transactions as a micro-optimization. > > > > However, the driver is now responsible to synchronize the CPU cache with > > the DMA that bypassed it. On x86 this is done through the wbinvd > > instruction, and the i915 GPU driver is the only Linux DMA driver that > > calls it. > > More accurately x86 supports both unprivileged clflush instructions > to invalidate one cacheline and a privileged wbinvd instruction to > invalidate the entire cache. Replacing 'this is done' with 'this may > be done' is clearer. > > > > > The problem comes that KVM on x86 will normally disable the wbinvd > > instruction in the guest and render it a NOP. As the driver running in the > > guest is not aware the wbinvd doesn't work it may still cause the device > > to set the no-snoop bit and the platform will bypass the CPU cache. > > Without a working wbinvd there is no way to re-synchronize the CPU cache > > and the driver in the VM has data corruption. > > > > Thus, we see a general direction on x86 that the IOMMU HW is able to block > > the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to > > to NOP the wbinvd without causing any data corruption. > > > > This control for Intel IOMMU was exposed by using IOMMU_CACHE and > > IOMMU_CAP_CACHE_COHERENCY, however these two values now have > > multiple > > meanings and usages beyond blocking no-snoop and the whole thing has > > become confused. > > Also point out your finding about AMD IOMMU? Done, thanks Jason