* David Woodhouse (dwmw2@xxxxxxxxxxxxx) wrote: > On Tue, 2011-11-15 at 21:11 -0700, Alex Williamson wrote: > > We currently manage iommu_coherency on a per domain basis, > > choosing the safest setting across the iommus attached to a > > particular domain. This unfortunately has a bug that when > > no iommus are attached, the domain defaults to coherent. > > If we fall into this mode, then later add a device behind a > > non-coherent iommu to that domain, the context entry is > > updated using the wrong coherency setting, and we get dmar > > faults. > > > > Since we expect chipsets to be consistent in their coherency > > setting, we can instead determine the coherency once and use > > it globally. > > (Adding Rajesh). > > Hm, it seems I lied to you about this. The non-coherent mode isn't just > a historical mistake; it's configurable by the BIOS, and we actually > encourage people to use the non-coherent mode because it makes the > hardware page-walk faster — so reduces the latency for IOTLB misses. Interesting because for the workloads I've tested it's the exact opposite. Tested w/ BIOS enabling and disabling coherency, and w/ non-coherent access and streaming DMA (i.e. bare metal NIC bw testing)...the IOMMU added smth like 10% when non-coherent vs. coherent. > In addition to that, the IOMMU associated with the integrated graphics > is so "special" that it doesn't support coherent mode either. So it *is* > quite feasible that we'll see a machine where some IOMMUs support > coherent mode, and some don't. > > And thus we do need to address the concern that just assuming > non-coherent mode will cause unnecessary performance issues, for the > case where a domain *doesn't* happen to include any of the non-coherent > IOMMUs. > > However... for VM domains I don't think we care. Setting up the page > tables *isn't* a fast path there (at least not until/unless we support > exposing an emulated IOMMU to the guest). > > The case we care about is *native* DMA, where this cache flush will be > happening for example in the fast path of network TX/RX. But in *that* > case, there is only *one* IOMMU to worry about so it's simple enough to > do the right thing, surely? Definitely agreed on the above points, limited page table setup/teardown to VMs and bare metal case is sensitive to IOMMU overhead of flushing. thanks, -chris -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html