On Wed, Oct 18, 2023 at 09:27:08PM +0100, Joao Martins wrote: > +static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops, > + unsigned long iova, size_t size, > + unsigned long flags, > + struct iommu_dirty_bitmap *dirty) > +{ > + struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops); > + unsigned long end = iova + size - 1; > + > + do { > + unsigned long pgsize = 0; > + u64 *ptep, pte; > + > + ptep = fetch_pte(pgtable, iova, &pgsize); > + if (ptep) > + pte = READ_ONCE(*ptep); It is fine for now, but this is so slow for something that is such a fast path. We are optimizing away a TLB invalidation but leaving this??? It is a radix tree, you walk trees by retaining your position at each level as you go (eg in a function per-level call chain or something) then ++ is cheap. Re-searching the entire tree every time is madness. Jason