Re: [RFC PATCH 0/3] Balloon inhibit enhancements

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Mon, 30 Jul 2018 17:58:49 +0300

On Mon, Jul 30, 2018 at 04:46:25PM +0200, David Hildenbrand wrote:
> On 30.07.2018 15:59, Michael S. Tsirkin wrote:
> > On Mon, Jul 30, 2018 at 03:54:04PM +0200, David Hildenbrand wrote:
> >> On 30.07.2018 15:34, Michael S. Tsirkin wrote:
> >>> On Tue, Jul 17, 2018 at 04:47:31PM -0600, Alex Williamson wrote:
> >>>> Directly assigned vfio devices have never been compatible with
> >>>> ballooning.  Zapping MADV_DONTNEED pages happens completely
> >>>> independent of vfio page pinning and IOMMU mapping, leaving us with
> >>>> inconsistent GPA to HPA mapping between vCPUs and assigned devices
> >>>> when the balloon deflates.  Mediated devices can theoretically do
> >>>> better, if we make the assumption that the mdev vendor driver is fully
> >>>> synchronized to the actual working set of the guest driver.  In that
> >>>> case the guest balloon driver should never be able to allocate an mdev
> >>>> pinned page for balloon inflation.  Unfortunately, QEMU can't know the
> >>>> workings of the vendor driver pinning, and doesn't actually know the
> >>>> difference between mdev devices and directly assigned devices.  Until
> >>>> we can sort out how the vfio IOMMU backend can tell us if ballooning
> >>>> is safe, the best approach is to disabling ballooning any time a vfio
> >>>> devices is attached.
> >>>>
> >>>> To do that, simply make the balloon inhibitor a counter rather than a
> >>>> boolean, fixup a case where KVM can then simply use the inhibit
> >>>> interface, and inhibit ballooning any time a vfio device is attached.
> >>>> I'm expecting we'll expose some sort of flag similar to
> >>>> KVM_CAP_SYNC_MMU from the vfio IOMMU for cases where we can resolve
> >>>> this.  An addition we could consider here would be yet another device
> >>>> option for vfio, such as x-disable-balloon-inhibit, in case there are
> >>>> mdev devices that behave in a manner compatible with ballooning.
> >>>>
> >>>> Please let me know if this looks like a good idea.  Thanks,
> >>>>
> >>>> Alex
> >>>
> >>> It's probably the only a reasonable thing to do for this release.
> >>>
> >>> Long term however, why can't balloon notify vfio as pages are
> >>> added and removed? VFIO could update its mappings then.
> >>
> >> What if the guest is rebooted and pages are silently getting reused
> >> without getting a deflation request first?
> > 
> > Good point. To handle we'd need to deflate fully on
> > on device reset, allowing access to all memory again.
> 
> 1. Doing it from the guest: not reliable. E.g. think about crashes +
> reboots, or a plain "system_reset" in QEMU. Deflation is definetly not
> reliably possible.
> 
> 2. Doing it in QEMU balloon implementation. Not possible. We don't track
> the memory that has been inflated (and also should not do it).
>
> So the only thing we could do is "deflate all guest memory" which
> implies a madvise WILLNEED on all guest memory. We definitely don't want
> this. We could inform vfio about all guest memory.

Exactly. No need to track anything we just need QEMU to allow access to
all guest memory.

> Everything sounds like a big hack that should be handled internally in
> the kernel.

What exactly do you want the kernel to do?

> -- 
> 
> Thanks,
> 
> David / dhildenb