Hi Jason, On Wed, 18 May 2022 15:52:05 -0300, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Wed, May 18, 2022 at 11:42:04AM -0700, Jacob Pan wrote: > > > > Yes.. It seems inefficient to iterate over that xarray multiple times > > > on the flush hot path, but maybe there is little choice. Try to use > > > use the xas iterators under the xa_lock spinlock.. > > > > > xas_for_each takes a max range, here we don't really have one. So I > > posted v4 w/o using the xas advanced API. Please let me know if you have > > suggestions. > > You are supposed to use ULONG_MAX in cases like that. > got it. > > xa_for_each takes RCU read lock, it should be fast for tlb flush, > > right? The worst case maybe over flush when we have stale data but > > should be very rare. > > Not really, xa_for_each walks the tree for every iteration, it is > slower than a linked list walk in any cases where the xarray is > multi-node. xas_for_each is able to retain a pointer where it is in > the tree so each iteration is usually just a pointer increment. > Thanks for explaining, yeah if we have to iterate multiple times xas_for_each() is better. > The downside is you cannot sleep while doing xas_for_each > will do under RCU read lock > > > The challenge will be accessing the group xa in the first place, but > > > maybe the core code can gain a function call to return a pointer to > > > that XA or something.. > > > I added a helper function to find the matching DMA API PASID in v4. > > Again, why are we focused on DMA API? Nothing you build here should be > DMA API beyond the fact that the iommu_domain being attached is the > default domain. The helper is not DMA API specific. Just a domain-PASID look up. Sorry for the confusion. Thanks, Jacob