On Thu, Sep 30, 2021 at 07:04:46PM -0300, Jason Gunthorpe wrote: > > On Arm cache coherency is configured through PTE attributes. I don't think > > PCI No_snoop should be used because it's not necessarily supported > > throughout the system and, as far as I understand, software can't discover > > whether it is. > > The usage of no-snoop is a behavior of a device. A generic PCI driver > should be able to program the device to generate no-snoop TLPs and > ideally rely on an arch specific API in the OS to trigger the required > cache maintenance. Well, it is a combination of the device, the root port and the driver which all need to be in line to use this. > It doesn't make much sense for a portable driver to rely on a > non-portable IO PTE flag to control coherency, since that is not a > standards based approach. > > That said, Linux doesn't have a generic DMA API to support > no-snoop. The few GPUs drivers that use this stuff just hardwired > wbsync on Intel.. Yes, as usual the GPU folks come up with nasty hacks instead of providing generic helper. Basically all we'd need to support it in a generic way is: - a DMA_ATTR_NO_SNOOP (or DMA_ATTR_FORCE_NONCOHERENT to fit the Linux terminology) which treats the current dma_map/unmap/sync calls as if dev_is_dma_coherent was false - a way for the driver to discover that a given architecture / running system actually supports this > What I don't really understand is why ARM, with an IOMMU that supports > PTE WB, has devices where dev_is_dma_coherent() == false ? Because no IOMMU in the world can help that fact that a periphal on the SOC is not part of the cache coherency protocol.