On Thu, Sep 30, 2021 at 11:33:13AM +0100, Jean-Philippe Brucker wrote: > On Thu, Sep 30, 2021 at 08:30:42AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe > > > Sent: Wednesday, September 29, 2021 8:37 PM > > > > > > On Wed, Sep 29, 2021 at 08:48:28AM +0000, Tian, Kevin wrote: > > > > > > > ARM: > > > > - set to snoop format if IOMMU_CACHE > > > > - set to nonsnoop format if !IOMMU_CACHE > > > > (in both cases TLP snoop bit is ignored?) > > > > > > Where do you see this? I couldn't even find this functionality in the > > > ARM HW manual?? > > > > Honestly speaking I'm getting confused by the complex attribute > > transformation control (default, replace, combine, input, output, etc.) > > in SMMU manual. Above was my impression after last check, but now > > I cannot find necessary info to build the same picture (except below > > code). :/ > > > > > > > > What I saw is ARM linking the IOMMU_CACHE to a IO PTE bit that causes > > > the cache coherence to be disabled, which is not ignoring no snoop. > > > > My impression was that snoop is one way of implementing cache > > coherency and now since the PTE can explicitly specify cache coherency > > like below: > > > > else if (prot & IOMMU_CACHE) > > pte |= ARM_LPAE_PTE_MEMATTR_OIWB; > > else > > pte |= ARM_LPAE_PTE_MEMATTR_NC; > > > > This setting in concept overrides the snoop attribute from the device thus > > make it sort of ignored? > > To make sure we're talking about the same thing: "the snoop attribute from > the device" is the "No snoop" attribute in the PCI TLP, right? > > The PTE flags define whether the memory access is cache-coherent or not. > * WB is cacheable (short for write-back cacheable. Doesn't matter here > what OI or RWA mean.) > * NC is non-cacheable. > > | Normal PCI access | No_snoop PCI access > PTE WB | Cacheable | Non-cacheable > PTE NC | Non-cacheable | Non-cacheable > > Cacheable memory access participate in cache coherency. Non-cacheable > accesses go directly to memory, do not cause cache allocation. This table is what I was thinking after reading through the ARM docs. > On Arm cache coherency is configured through PTE attributes. I don't think > PCI No_snoop should be used because it's not necessarily supported > throughout the system and, as far as I understand, software can't discover > whether it is. The usage of no-snoop is a behavior of a device. A generic PCI driver should be able to program the device to generate no-snoop TLPs and ideally rely on an arch specific API in the OS to trigger the required cache maintenance. It doesn't make much sense for a portable driver to rely on a non-portable IO PTE flag to control coherency, since that is not a standards based approach. That said, Linux doesn't have a generic DMA API to support no-snoop. The few GPUs drivers that use this stuff just hardwired wbsync on Intel.. What I don't really understand is why ARM, with an IOMMU that supports PTE WB, has devices where dev_is_dma_coherent() == false ? Is it the case that DMA from those devices ignores the IO PTE's cachable mode? Jason