On Wed, May 18, 2022 at 11:42:04AM -0700, Jacob Pan wrote: > > Yes.. It seems inefficient to iterate over that xarray multiple times > > on the flush hot path, but maybe there is little choice. Try to use > > use the xas iterators under the xa_lock spinlock.. > > > xas_for_each takes a max range, here we don't really have one. So I posted > v4 w/o using the xas advanced API. Please let me know if you have > suggestions. You are supposed to use ULONG_MAX in cases like that. > xa_for_each takes RCU read lock, it should be fast for tlb flush, right? The > worst case maybe over flush when we have stale data but should be very rare. Not really, xa_for_each walks the tree for every iteration, it is slower than a linked list walk in any cases where the xarray is multi-node. xas_for_each is able to retain a pointer where it is in the tree so each iteration is usually just a pointer increment. The downside is you cannot sleep while doing xas_for_each > > The challenge will be accessing the group xa in the first place, but > > maybe the core code can gain a function call to return a pointer to > > that XA or something.. > I added a helper function to find the matching DMA API PASID in v4. Again, why are we focused on DMA API? Nothing you build here should be DMA API beyond the fact that the iommu_domain being attached is the default domain. Jason