On Wed, 11 Aug 2021 21:34:26 +0530, Sai Prakash Ranjan wrote: > Currently for iommu_unmap() of large scatter-gather list with page size > elements, the majority of time is spent in flushing of partial walks in > __arm_lpae_unmap() which is a VA based TLB invalidation invalidating > page-by-page on iommus like arm-smmu-v2 (TLBIVA). > > For example: to unmap a 32MB scatter-gather list with page size elements > (8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB > for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K) > resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge > overhead. > > [...] Applied to will (for-joerg/arm-smmu/updates), thanks! [1/1] iommu/arm-smmu: Optimize ->tlb_flush_walk() for qcom implementation https://git.kernel.org/will/c/ef75702d6d65 Cheers, -- Will https://fixes.arm64.dev https://next.arm64.dev https://will.arm64.dev