On Thu, Apr 28, 2022 at 10:09:15PM +0100, Joao Martins wrote: > + > +unsigned int iommu_dirty_bitmap_record(struct iommu_dirty_bitmap *dirty, > + unsigned long iova, unsigned long length) > +{ Lets put iommu_dirty_bitmap in its own patch, the VFIO driver side will want to use this same data structure. > + while (nbits > 0) { > + kaddr = kmap(dirty->pages[idx]) + start_offset; kmap_local? > +/** > + * struct iommu_dirty_bitmap - Dirty IOVA bitmap state > + * > + * @iova: IOVA representing the start of the bitmap, the first bit of the bitmap > + * @pgshift: Page granularity of the bitmap > + * @gather: Range information for a pending IOTLB flush > + * @start_offset: Offset of the first user page > + * @pages: User pages representing the bitmap region > + * @npages: Number of user pages pinned > + */ > +struct iommu_dirty_bitmap { > + unsigned long iova; > + unsigned long pgshift; > + struct iommu_iotlb_gather *gather; > + unsigned long start_offset; > + unsigned long npages; > + struct page **pages; In many (all?) cases I would expect this to be called from a process context, can we just store the __user pointer here, or is the idea that with modern kernels poking a u64 to userspace is slower than a kmap? I'm particularly concerend that this starts to require high order allocations with more than 2M of bitmap.. Maybe one direction is to GUP 2M chunks at a time and walk the __user pointer. > +static inline void iommu_dirty_bitmap_init(struct iommu_dirty_bitmap *dirty, > + unsigned long base, > + unsigned long pgshift, > + struct iommu_iotlb_gather *gather) > +{ > + memset(dirty, 0, sizeof(*dirty)); > + dirty->iova = base; > + dirty->pgshift = pgshift; > + dirty->gather = gather; > + > + if (gather) > + iommu_iotlb_gather_init(dirty->gather); > +} I would expect all the GUPing logic to be here too? Jason