On Mon, Jul 30, 2018 at 04:46:25PM +0200, David Hildenbrand wrote: > On 30.07.2018 15:59, Michael S. Tsirkin wrote: > > On Mon, Jul 30, 2018 at 03:54:04PM +0200, David Hildenbrand wrote: > >> On 30.07.2018 15:34, Michael S. Tsirkin wrote: > >>> On Tue, Jul 17, 2018 at 04:47:31PM -0600, Alex Williamson wrote: > >>>> Directly assigned vfio devices have never been compatible with > >>>> ballooning. Zapping MADV_DONTNEED pages happens completely > >>>> independent of vfio page pinning and IOMMU mapping, leaving us with > >>>> inconsistent GPA to HPA mapping between vCPUs and assigned devices > >>>> when the balloon deflates. Mediated devices can theoretically do > >>>> better, if we make the assumption that the mdev vendor driver is fully > >>>> synchronized to the actual working set of the guest driver. In that > >>>> case the guest balloon driver should never be able to allocate an mdev > >>>> pinned page for balloon inflation. Unfortunately, QEMU can't know the > >>>> workings of the vendor driver pinning, and doesn't actually know the > >>>> difference between mdev devices and directly assigned devices. Until > >>>> we can sort out how the vfio IOMMU backend can tell us if ballooning > >>>> is safe, the best approach is to disabling ballooning any time a vfio > >>>> devices is attached. > >>>> > >>>> To do that, simply make the balloon inhibitor a counter rather than a > >>>> boolean, fixup a case where KVM can then simply use the inhibit > >>>> interface, and inhibit ballooning any time a vfio device is attached. > >>>> I'm expecting we'll expose some sort of flag similar to > >>>> KVM_CAP_SYNC_MMU from the vfio IOMMU for cases where we can resolve > >>>> this. An addition we could consider here would be yet another device > >>>> option for vfio, such as x-disable-balloon-inhibit, in case there are > >>>> mdev devices that behave in a manner compatible with ballooning. > >>>> > >>>> Please let me know if this looks like a good idea. Thanks, > >>>> > >>>> Alex > >>> > >>> It's probably the only a reasonable thing to do for this release. > >>> > >>> Long term however, why can't balloon notify vfio as pages are > >>> added and removed? VFIO could update its mappings then. > >> > >> What if the guest is rebooted and pages are silently getting reused > >> without getting a deflation request first? > > > > Good point. To handle we'd need to deflate fully on > > on device reset, allowing access to all memory again. > > 1. Doing it from the guest: not reliable. E.g. think about crashes + > reboots, or a plain "system_reset" in QEMU. Deflation is definetly not > reliably possible. > > 2. Doing it in QEMU balloon implementation. Not possible. We don't track > the memory that has been inflated (and also should not do it). > > So the only thing we could do is "deflate all guest memory" which > implies a madvise WILLNEED on all guest memory. We definitely don't want > this. We could inform vfio about all guest memory. Exactly. No need to track anything we just need QEMU to allow access to all guest memory. > Everything sounds like a big hack that should be handled internally in > the kernel. What exactly do you want the kernel to do? > -- > > Thanks, > > David / dhildenb