Re: [PATCH V4] iommu/io-pgtable-arm: Optimise non-coherent unmap

Jason Gunthorpe <jgg@xxxxxxxx> · Mon, 9 Sep 2024 12:09:32 -0300

On Wed, Sep 04, 2024 at 09:24:34AM -0700, Rob Clark wrote:
> Btw, this seems to be causing iommu faults for me for what (according
> to a sw pgtable walk) should be a valid mapping, indicating
> missing/incomplete tlb invalidation.  This is with drm/msm (which
> probably matters, since it implements it's own iommu_flush_ops) on
> x1e80100 (which probably doesn't matter.. but it is an mmu-500 in case
> it does).
> 
> I _think_ what is causing this is the change in ordering of
> __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
> memory) and io_pgtable_tlb_flush_walk().  I'm not entirely sure how
> this patch is supposed to work correctly in the face of other
> concurrent translations (to buffers unrelated to the one being
> unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
> stale data read back into the tlb.

You mean this?

 			if (!iopte_leaf(pte, lvl, iop->fmt)) {
+				__arm_lpae_clear_pte(&ptep[i], &iop->cfg, 1);
+
 				/* Also flush any partial walks */
 				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
 							  ARM_LPAE_GRANULE(data));
 				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));

I would say it should work because
 1) The pte is cleared and cache flushed before the iotlb is cleared
    by the added __arm_lpae_clear_pte()
 2) This is not a 'shared table' since it is fully covered by the
    size being unmapped. The caller must ensure there are no
    inersecting concurrent map/unmaps.
 3) The double zeroing doesn't matter because of #2, no races are
    permitted.

Jason