On Tue, Jul 31, 2018 at 03:50:30PM -0600, Alex Williamson wrote: > On Tue, 31 Jul 2018 16:07:46 +0100 > "Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> wrote: > > > * Alex Williamson (alex.williamson@xxxxxxxxxx) wrote: > > > On Tue, 31 Jul 2018 15:29:17 +0300 > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > > > > > > > On Mon, Jul 30, 2018 at 05:13:26PM -0600, Alex Williamson wrote: > > > > > v2: > > > > > - Use atomic ops for balloon inhibit counter (Peter) > > > > > - Allow endpoint driver opt-in for ballooning, vfio-ccw opt-in by > > > > > default, vfio-pci opt-in by device option, only allowed for mdev > > > > > devices, no support added for platform as there are no platform > > > > > mdev devices. > > > > > > > > > > See patch 3/4 for detailed explanation why ballooning and device > > > > > assignment typically don't mix. If this eventually changes, flags > > > > > on the iommu info struct or perhaps device info struct can inform > > > > > us for automatic opt-in. Thanks, > > > > > > > > > > Alex > > > > > > > > So this patch seems to block ballooning when vfio is added. > > > > But what if balloon is added and inflated first? > > > > > > Good point. > > > > > > > I'd suggest making qemu_balloon_inhibit fail in that case, > > > > and then vfio realize will fail as well. > > > > > > That might be the correct behavior for vfio, but I wonder about the > > > existing postcopy use case. Dave Gilbert, what do you think? We might > > > need a separate interface for callers that cannot tolerate existing > > > ballooned pages. Of course we'll also need another atomic counter to > > > keep a tally of ballooned pages. Thanks, > > > > For postcopy, preinflation isn't a problem; our only issue is ballooning > > during the postcopy phase itself. > > On further consideration, I think device assignment is in the same > category. The balloon inhibitor does not actually stop the guest > balloon driver from grabbing and freeing pages, it only changes whether > QEMU releases the pages with madvise DONTNEED. The problem we have > with ballooning and device assignment is when we have an existing HPA > mapping in the IOMMU that isn't invalidated on DONTNEED and becomes > inconsistent when the page is re-populated. Zapped pages at the time > an assigned device is added do not trigger this, those pages will be > repopulated when pages are pinned for the assigned device. This is the > identical scenario to a freshly started VM that doesn't use memory > preallocation and therefore faults in pages on demand. When an > assigned device is attached to such a VM, page pinning will fault in > and lock all of those pages. Granted this means memory won't be corrupted, but it is also highly unlikely to be what the user wanted. > This is observable behavior, for example if I start a VM with 16GB of > RAM, booted to a command prompt the VM shows less that 1GB of RAM > resident in the host. If I set the balloon to 2048, there's no > observable change in the QEMU process size on the host. If I hot-add > an assigned device while we're ballooned down, the resident memory size > from the host jumps up to 16GB. All of the zapped pages have been > reclaimed. Adjusting ballooning at this point only changes the balloon > size in the guest, inflating the balloon no longer zaps pages from the > process. > > The only oddity I see is the one Dave noted in the commit introducing > balloon inhibiting (371ff5a3f04c): > > Queueing the requests until after migration would be nice, but is > non-trivial, since the set of inflate/deflate requests have to > be compared with the state of the page to know what the final > outcome is allowed to be. > > So for this example of a 16GB VM ballooned down to 2GB then an assigned > device added and subsequently removed, the resident memory remains 16GB > and I need to deflate the balloon and reinflate it in order to zap them > from the QEMU process. Therefore, I think that with respect to this > inquiry, the series stands as is. Thanks, > > Alex