On Tue, Aug 07, 2018 at 01:31:24PM -0600, Alex Williamson wrote: > We use a VFIOContainer to associate an AddressSpace to one or more > VFIOGroups. The VFIOContainer represents the DMA context for that > AdressSpace for those VFIOGroups and is synchronized to changes in > that AddressSpace via a MemoryListener. For IOMMU backed devices, > maintaining the DMA context for a VFIOGroup generally involves > pinning a host virtual address in order to create a stable host > physical address and then mapping a translation from the associated > guest physical address to that host physical address into the IOMMU. > > While the above maintains the VFIOContainer synchronized to the QEMU > memory API of the VM, memory ballooning occurs outside of that API. > Inflating the memory balloon (ie. cooperatively capturing pages from > the guest for use by the host) simply uses MADV_DONTNEED to "zap" > pages from QEMU's host virtual address space. The page pinning and > IOMMU mapping above remains in place, negating the host's ability to > reuse the page, but the host virtual to host physical mapping of the > page is invalidated outside of QEMU's memory API. > > When the balloon is later deflated, attempting to cooperatively > return pages to the guest, the page is simply freed by the guest > balloon driver, allowing it to be used in the guest and incurring a > page fault when that occurs. The page fault maps a new host physical > page backing the existing host virtual address, meanwhile the > VFIOContainer still maintains the translation to the original host > physical address. At this point the guest vCPU and any assigned > devices will map different host physical addresses to the same guest > physical address. Badness. > > The IOMMU typically does not have page level granularity with which > it can track this mapping without also incurring inefficiencies in > using page size mappings throughout. MMU notifiers in the host > kernel also provide indicators for invalidating the mapping on > balloon inflation, not for updating the mapping when the balloon is > deflated. For these reasons we assume a default behavior that the > mapping of each VFIOGroup into the VFIOContainer is incompatible > with memory ballooning and increment the balloon inhibitor to match > the attached VFIOGroups. > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> Reviewed-by: Peter Xu <peterx@xxxxxxxxxx> Thanks, -- Peter Xu