Re: [RFC PATCH 0/3] Balloon inhibit enhancements

David Hildenbrand <david@xxxxxxxxxx> · Mon, 30 Jul 2018 17:05:01 +0200

On 30.07.2018 16:58, Michael S. Tsirkin wrote:
> On Mon, Jul 30, 2018 at 04:46:25PM +0200, David Hildenbrand wrote:
>> On 30.07.2018 15:59, Michael S. Tsirkin wrote:
>>> On Mon, Jul 30, 2018 at 03:54:04PM +0200, David Hildenbrand wrote:
>>>> On 30.07.2018 15:34, Michael S. Tsirkin wrote:
>>>>> On Tue, Jul 17, 2018 at 04:47:31PM -0600, Alex Williamson wrote:
>>>>>> Directly assigned vfio devices have never been compatible with
>>>>>> ballooning.  Zapping MADV_DONTNEED pages happens completely
>>>>>> independent of vfio page pinning and IOMMU mapping, leaving us with
>>>>>> inconsistent GPA to HPA mapping between vCPUs and assigned devices
>>>>>> when the balloon deflates.  Mediated devices can theoretically do
>>>>>> better, if we make the assumption that the mdev vendor driver is fully
>>>>>> synchronized to the actual working set of the guest driver.  In that
>>>>>> case the guest balloon driver should never be able to allocate an mdev
>>>>>> pinned page for balloon inflation.  Unfortunately, QEMU can't know the
>>>>>> workings of the vendor driver pinning, and doesn't actually know the
>>>>>> difference between mdev devices and directly assigned devices.  Until
>>>>>> we can sort out how the vfio IOMMU backend can tell us if ballooning
>>>>>> is safe, the best approach is to disabling ballooning any time a vfio
>>>>>> devices is attached.
>>>>>>
>>>>>> To do that, simply make the balloon inhibitor a counter rather than a
>>>>>> boolean, fixup a case where KVM can then simply use the inhibit
>>>>>> interface, and inhibit ballooning any time a vfio device is attached.
>>>>>> I'm expecting we'll expose some sort of flag similar to
>>>>>> KVM_CAP_SYNC_MMU from the vfio IOMMU for cases where we can resolve
>>>>>> this.  An addition we could consider here would be yet another device
>>>>>> option for vfio, such as x-disable-balloon-inhibit, in case there are
>>>>>> mdev devices that behave in a manner compatible with ballooning.
>>>>>>
>>>>>> Please let me know if this looks like a good idea.  Thanks,
>>>>>>
>>>>>> Alex
>>>>>
>>>>> It's probably the only a reasonable thing to do for this release.
>>>>>
>>>>> Long term however, why can't balloon notify vfio as pages are
>>>>> added and removed? VFIO could update its mappings then.
>>>>
>>>> What if the guest is rebooted and pages are silently getting reused
>>>> without getting a deflation request first?
>>>
>>> Good point. To handle we'd need to deflate fully on
>>> on device reset, allowing access to all memory again.
>>
>> 1. Doing it from the guest: not reliable. E.g. think about crashes +
>> reboots, or a plain "system_reset" in QEMU. Deflation is definetly not
>> reliably possible.
>>
>> 2. Doing it in QEMU balloon implementation. Not possible. We don't track
>> the memory that has been inflated (and also should not do it).
>>
>> So the only thing we could do is "deflate all guest memory" which
>> implies a madvise WILLNEED on all guest memory. We definitely don't want
>> this. We could inform vfio about all guest memory.
> 
> Exactly. No need to track anything we just need QEMU to allow access to
> all guest memory.
> 
>> Everything sounds like a big hack that should be handled internally in
>> the kernel.
> 
> What exactly do you want the kernel to do?

As already discussed (in this thread? I don't remember), Alex was asking
if there is some kind of notifier way in the kernel to get notified when
a fresh page is being used on memory that was previously madvise
DONTNEEDed. Then that page could be immediately repinned.

-- 

Thanks,

David / dhildenb