On 30.07.2018 16:58, Michael S. Tsirkin wrote: > On Mon, Jul 30, 2018 at 04:46:25PM +0200, David Hildenbrand wrote: >> On 30.07.2018 15:59, Michael S. Tsirkin wrote: >>> On Mon, Jul 30, 2018 at 03:54:04PM +0200, David Hildenbrand wrote: >>>> On 30.07.2018 15:34, Michael S. Tsirkin wrote: >>>>> On Tue, Jul 17, 2018 at 04:47:31PM -0600, Alex Williamson wrote: >>>>>> Directly assigned vfio devices have never been compatible with >>>>>> ballooning. Zapping MADV_DONTNEED pages happens completely >>>>>> independent of vfio page pinning and IOMMU mapping, leaving us with >>>>>> inconsistent GPA to HPA mapping between vCPUs and assigned devices >>>>>> when the balloon deflates. Mediated devices can theoretically do >>>>>> better, if we make the assumption that the mdev vendor driver is fully >>>>>> synchronized to the actual working set of the guest driver. In that >>>>>> case the guest balloon driver should never be able to allocate an mdev >>>>>> pinned page for balloon inflation. Unfortunately, QEMU can't know the >>>>>> workings of the vendor driver pinning, and doesn't actually know the >>>>>> difference between mdev devices and directly assigned devices. Until >>>>>> we can sort out how the vfio IOMMU backend can tell us if ballooning >>>>>> is safe, the best approach is to disabling ballooning any time a vfio >>>>>> devices is attached. >>>>>> >>>>>> To do that, simply make the balloon inhibitor a counter rather than a >>>>>> boolean, fixup a case where KVM can then simply use the inhibit >>>>>> interface, and inhibit ballooning any time a vfio device is attached. >>>>>> I'm expecting we'll expose some sort of flag similar to >>>>>> KVM_CAP_SYNC_MMU from the vfio IOMMU for cases where we can resolve >>>>>> this. An addition we could consider here would be yet another device >>>>>> option for vfio, such as x-disable-balloon-inhibit, in case there are >>>>>> mdev devices that behave in a manner compatible with ballooning. >>>>>> >>>>>> Please let me know if this looks like a good idea. Thanks, >>>>>> >>>>>> Alex >>>>> >>>>> It's probably the only a reasonable thing to do for this release. >>>>> >>>>> Long term however, why can't balloon notify vfio as pages are >>>>> added and removed? VFIO could update its mappings then. >>>> >>>> What if the guest is rebooted and pages are silently getting reused >>>> without getting a deflation request first? >>> >>> Good point. To handle we'd need to deflate fully on >>> on device reset, allowing access to all memory again. >> >> 1. Doing it from the guest: not reliable. E.g. think about crashes + >> reboots, or a plain "system_reset" in QEMU. Deflation is definetly not >> reliably possible. >> >> 2. Doing it in QEMU balloon implementation. Not possible. We don't track >> the memory that has been inflated (and also should not do it). >> >> So the only thing we could do is "deflate all guest memory" which >> implies a madvise WILLNEED on all guest memory. We definitely don't want >> this. We could inform vfio about all guest memory. > > Exactly. No need to track anything we just need QEMU to allow access to > all guest memory. > >> Everything sounds like a big hack that should be handled internally in >> the kernel. > > What exactly do you want the kernel to do? As already discussed (in this thread? I don't remember), Alex was asking if there is some kind of notifier way in the kernel to get notified when a fresh page is being used on memory that was previously madvise DONTNEEDed. Then that page could be immediately repinned. -- Thanks, David / dhildenb