On 30.07.2018 15:59, Michael S. Tsirkin wrote: > On Mon, Jul 30, 2018 at 03:54:04PM +0200, David Hildenbrand wrote: >> On 30.07.2018 15:34, Michael S. Tsirkin wrote: >>> On Tue, Jul 17, 2018 at 04:47:31PM -0600, Alex Williamson wrote: >>>> Directly assigned vfio devices have never been compatible with >>>> ballooning. Zapping MADV_DONTNEED pages happens completely >>>> independent of vfio page pinning and IOMMU mapping, leaving us with >>>> inconsistent GPA to HPA mapping between vCPUs and assigned devices >>>> when the balloon deflates. Mediated devices can theoretically do >>>> better, if we make the assumption that the mdev vendor driver is fully >>>> synchronized to the actual working set of the guest driver. In that >>>> case the guest balloon driver should never be able to allocate an mdev >>>> pinned page for balloon inflation. Unfortunately, QEMU can't know the >>>> workings of the vendor driver pinning, and doesn't actually know the >>>> difference between mdev devices and directly assigned devices. Until >>>> we can sort out how the vfio IOMMU backend can tell us if ballooning >>>> is safe, the best approach is to disabling ballooning any time a vfio >>>> devices is attached. >>>> >>>> To do that, simply make the balloon inhibitor a counter rather than a >>>> boolean, fixup a case where KVM can then simply use the inhibit >>>> interface, and inhibit ballooning any time a vfio device is attached. >>>> I'm expecting we'll expose some sort of flag similar to >>>> KVM_CAP_SYNC_MMU from the vfio IOMMU for cases where we can resolve >>>> this. An addition we could consider here would be yet another device >>>> option for vfio, such as x-disable-balloon-inhibit, in case there are >>>> mdev devices that behave in a manner compatible with ballooning. >>>> >>>> Please let me know if this looks like a good idea. Thanks, >>>> >>>> Alex >>> >>> It's probably the only a reasonable thing to do for this release. >>> >>> Long term however, why can't balloon notify vfio as pages are >>> added and removed? VFIO could update its mappings then. >> >> What if the guest is rebooted and pages are silently getting reused >> without getting a deflation request first? > > Good point. To handle we'd need to deflate fully on > on device reset, allowing access to all memory again. 1. Doing it from the guest: not reliable. E.g. think about crashes + reboots, or a plain "system_reset" in QEMU. Deflation is definetly not reliably possible. 2. Doing it in QEMU balloon implementation. Not possible. We don't track the memory that has been inflated (and also should not do it). So the only thing we could do is "deflate all guest memory" which implies a madvise WILLNEED on all guest memory. We definitely don't want this. We could inform vfio about all guest memory. Everything sounds like a big hack that should be handled internally in the kernel. -- Thanks, David / dhildenb