On Fri, Mar 18, 2022 at 09:06:36AM -0600, Alex Williamson wrote: > There are advantages to each, the 2nd option gives the user more > visibility, more options to thread, but it also possibly duplicates > significant data. The coming mlx5 tracker won't require kernel storage at all, so I think this is something to tackle if/when someone comes with a device that uses the CPU to somehow track dirties (probably via a mdev that is already tracking DMA?) One thought is to let vfio coordinate a single allocation of a dirty bitmap xarray among drivers. Even in the worst case of duplicated bitmaps the memory usage is not fatally terrible it is about 32MB per 1TB of guest memory. > The unmap scenario above is also not quite as cohesive if the user > needs to poll devices for dirty pages in the unmapped range after > performing the unmap. It might make sense if the iommufd could > generate the merged bitmap on unmap as the threading optimization > probably has less value in that case. I don't think of it this way. The device tracker has no idea about munmap/mmap, it just tracks IOVA dirties. Which is a problem because any time we alter the IOVA to PFN map we need to read the device dirties and correlate them back to the actual CPU pages that were dirtied. unmap is one case, but nested paging invalidation is another much nastier problem. How exactly that can work is a bit of a mystery to me as the ultimate IOVA to PFN mapping is rather fuzzy/racy from the view of the hypervisor. So, I wouldn't invest effort to make a special kernel API to link unmap and leave invalidate unsolved. Just keeping them seperated seems to make more sense, and userspace knows better what it is doing. Eg vIOMMU cases need to synchronize the dirty, but other things like memory unplug don't.