> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx> > Sent: Tuesday, February 7, 2023 2:46 PM > > On 2023/2/6 11:48, Tian, Kevin wrote: > >> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx> > >> Sent: Saturday, February 4, 2023 2:32 PM > >> > >> On 2023/2/4 7:04, Jacob Pan wrote: > >>> Intel IOMMU driver implements IOTLB flush queue with domain selective > >>> or PASID selective invalidations. In this case there's no need to track > >>> IOVA page range and sync IOTLBs, which may cause significant > >> performance > >>> hit. > >> > >> [Add cc Robin] > >> > >> If I understand this patch correctly, this might be caused by below > >> helper: > >> > >> /** > >> * iommu_iotlb_gather_add_page - Gather for page-based TLB > invalidation > >> * @domain: IOMMU domain to be invalidated > >> * @gather: TLB gather data > >> * @iova: start of page to invalidate > >> * @size: size of page to invalidate > >> * > >> * Helper for IOMMU drivers to build invalidation commands based on > >> individual > >> * pages, or with page size/table level hints which cannot be gathered > >> if they > >> * differ. > >> */ > >> static inline void iommu_iotlb_gather_add_page(struct iommu_domain > >> *domain, > >> struct > >> iommu_iotlb_gather *gather, > >> unsigned long iova, > >> size_t size) > >> { > >> /* > >> * If the new page is disjoint from the current range or is > >> mapped at > >> * a different granularity, then sync the TLB so that the gather > >> * structure can be rewritten. > >> */ > >> if ((gather->pgsize && gather->pgsize != size) || > >> iommu_iotlb_gather_is_disjoint(gather, iova, size)) > >> iommu_iotlb_sync(domain, gather); > >> > >> gather->pgsize = size; > >> iommu_iotlb_gather_add_range(gather, iova, size); > >> } > >> > >> As the comments for iommu_iotlb_gather_is_disjoint() says, > >> > >> "...For many IOMMUs, flushing the IOMMU in this case is better > >> than merging the two, which might lead to unnecessary invalidations. > >> ..." > >> > >> So, perhaps the right fix for this performance issue is to add > >> > >> if (!gather->queued) > >> > >> in iommu_iotlb_gather_add_page() or iommu_iotlb_gather_is_disjoint()? > >> It should benefit other arch's as well. > >> > > > > There are only two callers of this helper: intel and arm-smmu-v3. > > > > Looks other drivers just implements direct flush via > io_pgtable_tlb_add_page(). > > > > and their unmap callback typically does: > > > > if (!iommu_iotlb_gather_queued(gather)) > > io_pgtable_tlb_add_page(); > > > > from this angle it's same policy as Jacob's does, i.e. if it's already > > queued then no need to further call optimization for direct flush. > > Perhaps we can use iommu_iotlb_gather_queued() to replace direct > gather->queued check in this patch as well? > yes