On Mon, Sep 02, 2024 at 09:29:53AM +0000, Mostafa Saleh wrote: > On Fri, Aug 30, 2024 at 01:40:19PM -0300, Jason Gunthorpe wrote: > > On Fri, Aug 30, 2024 at 03:12:54PM +0000, Mostafa Saleh wrote: > > > > + /* > > > > + * If for some reason the HW does not support DMA coherency then using > > > > + * S2FWB won't work. This will also disable nesting support. > > > > + */ > > > > + if (FIELD_GET(IDR3_FWB, reg) && > > > > + (smmu->features & ARM_SMMU_FEAT_COHERENCY)) > > > > + smmu->features |= ARM_SMMU_FEAT_S2FWB; > > > I think that’s for the SMMU coherency which in theory is not related to the > > > master which FWB overrides, so this check is not correct. > > > > Yes, I agree, in theory. > > > > However the driver today already links them together: > > > > case IOMMU_CAP_CACHE_COHERENCY: > > /* Assume that a coherent TCU implies coherent TBUs */ > > return master->smmu->features & ARM_SMMU_FEAT_COHERENCY; > > > > So this hunk was a continuation of that design. > > > > > What I meant in the previous thread that we should set FWB only for coherent > > > masters as (in attach s2): > > > if (smmu->features & ARM_SMMU_FEAT_S2FWB && dev_is_dma_coherent(master->dev) > > > // set S2FWB in STE > > > > I think as I explained in that thread, it is not really correct > > either. There is no reason to block using S2FWB for non-coherent > > masters that are not used with VFIO. The page table will still place > > the correct memattr according to the IOMMU_CACHE flag, S2FWB just > > slightly changes the encoding. > > It’s not just the encoding that changes, as > - Without FWB, stage-2 combine attributes > - While with FWB, it overrides them. You mean there is some incomming attribute in the transaction (obviously not talking PCI here) and S2FWB combines with that? > So a cacheable mapping in stage-2 can lead to a non-cacheable > (or with different cachableitiy attributes) transaction based on the > input. I am not sure though if there is such case in the kernel. If the kernel supplies IOMMU_CACHE then the kernel also skips all the cache flushing. So it would be a functional problem if combining was causing a non-cachable access through a IOMMU_CACHE S2 already. The DMA API would fail if that was the case. > > If anything should be changed then it would be the above > > IOMMU_CAP_CACHE_COHERENCY test, and I don't know if > > dev_is_dma_coherent() would be correct there, or if it should do some > > ACPI inspection or what. > > I agree, I believe that this assumption is not accurate, I am not sure > what is the right approach here, but in concept I think we shouldn’t > enable FWB for non-coherent devices (using dev_is_dma_coherent() or > other check) The DMA API requires that the cachability rules it sets via IOMMU_CACHE are followed. In this way the stricter behavior of S2FWB is a benefit, not a draw back. I'm still not seeing a problm here?? Jason