On Thu, Sep 30, 2021 at 09:35:45AM +0000, Tian, Kevin wrote: > > The Intel functional issue is that Intel blocks the cache maintaince > > ops from the VM and the VM has no way to self-discover that the cache > > maintaince ops don't work. > > the VM doesn't need to know whether the maintenance ops > actually works. Which is the whole problem. Intel has a design where the device driver tells the device to issue non-cachable TLPs. The driver is supposed to know if it can issue the cache maintaince instructions - if it can then it should ask the device to issue no-snoop TLPs. For instance the same PCI driver on non-x86 should never ask the device to issue no-snoop TLPs because it has no idea how to restore cache coherence on eg ARM. Do you see the issue? This configuration where the hypervisor silently make wbsync a NOP breaks the x86 architecture because the guest has no idea it can no longer use no-snoop features. Using the IOMMU to forcibly prevent the device from issuing no-snoop makes this whole issue of the broken wbsync moot. It is important to be really clear on what this is about - this is not some idealized nice iommu feature - it is working around alot of backwards compatability baggage that is probably completely unique to x86. > > Other arches don't seem to have this specific problem... > > I think the key is whether other archs allow driver to decide DMA > coherency and indirectly the underlying I/O page table format. > If yes, then I don't see a reason why such decision should not be > given to userspace for passthrough case. The choice all comes down to if the other arches have cache maintenance instructions in the VM that *don't work* Jason