> From: Tian, Kevin > Sent: Saturday, March 19, 2022 4:15 PM > > > Let me make it more specific by taking vIOMMU as an example. > No nesting i.e. Qemu generates a merged mapping for GIOVA->HPA > via iommufd. > > iommufd unmap is caused when emulating virtual iotlb invalidation > request, *after* the guest iommu driver clears the guest I/O page > table for the specified GIOVA range. > > The dirty bits recorded by the device is around the dma addresses > programmed by the guest, i.e. GIOVA. > > Now if qemu pulls dirty bits from vfio device after iommufd unmap, > how would qemu even know the corresponding PFN/VA for dirty > GFNs given the guest I/O mapping has been cleared? > Thinking more the real problem is not related to *before* vs. *after* thing. :/ If Qemu itself doesn't maintain a virtual iotlb (large enough to duplicate all valid mappings in guest I/O page table) and ensure the cached mappings for the unmapped range is not zapped before the dirty bitmap for that range is digested, the whole dirty tracking is just broken in this scenario no matter which approach is used and whether bitmap is retrieved before or after the iommufd unmap, given guest mappings for dirtied GIOVAs in the unmapped range already disappear at that point thus the path to find GIOVA->GPA->HVA is just broken. I roughly recalled a gap in Qemu viotlb was discussed when dirty bitmap was added to vfio unmap. At that time Qemu's viotlb was like a normal iotlb i.e. only caching mappings due to walking guest page table for emulating DMA from non-vfio devices in Qemu. That is definitely inadequate for aforementioned purpose. But I don't know whether this gap has been fixed now. there is no such concern with dpdk or VM w/o vIOMMU since the iova address space is managed by host userspace which has intrinsic knowledge about IOVA<->HVA even after iommufd unmap. this is also fine with hardware nesting. The hardware ensures all stage-1 activities converged on the dirty bits of stage-2 IOPTEs. So the userspace can just ignore stage-1 and just collects dirty bitmap associated with stage-2. Thanks Kevin