> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx> > Sent: Saturday, February 4, 2023 2:32 PM > > On 2023/2/4 7:04, Jacob Pan wrote: > > Intel IOMMU driver implements IOTLB flush queue with domain selective > > or PASID selective invalidations. In this case there's no need to track > > IOVA page range and sync IOTLBs, which may cause significant > performance > > hit. > > [Add cc Robin] > > If I understand this patch correctly, this might be caused by below > helper: > > /** > * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation > * @domain: IOMMU domain to be invalidated > * @gather: TLB gather data > * @iova: start of page to invalidate > * @size: size of page to invalidate > * > * Helper for IOMMU drivers to build invalidation commands based on > individual > * pages, or with page size/table level hints which cannot be gathered > if they > * differ. > */ > static inline void iommu_iotlb_gather_add_page(struct iommu_domain > *domain, > struct > iommu_iotlb_gather *gather, > unsigned long iova, > size_t size) > { > /* > * If the new page is disjoint from the current range or is > mapped at > * a different granularity, then sync the TLB so that the gather > * structure can be rewritten. > */ > if ((gather->pgsize && gather->pgsize != size) || > iommu_iotlb_gather_is_disjoint(gather, iova, size)) > iommu_iotlb_sync(domain, gather); > > gather->pgsize = size; > iommu_iotlb_gather_add_range(gather, iova, size); > } > > As the comments for iommu_iotlb_gather_is_disjoint() says, > > "...For many IOMMUs, flushing the IOMMU in this case is better > than merging the two, which might lead to unnecessary invalidations. > ..." > > So, perhaps the right fix for this performance issue is to add > > if (!gather->queued) > > in iommu_iotlb_gather_add_page() or iommu_iotlb_gather_is_disjoint()? > It should benefit other arch's as well. > There are only two callers of this helper: intel and arm-smmu-v3. Looks other drivers just implements direct flush via io_pgtable_tlb_add_page(). and their unmap callback typically does: if (!iommu_iotlb_gather_queued(gather)) io_pgtable_tlb_add_page(); from this angle it's same policy as Jacob's does, i.e. if it's already queued then no need to further call optimization for direct flush.