Re: [PATCH v2] virtio_balloon: add param to skip adjusting pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>>> I'd appreciate a more generic approach for user space to figure out the
>>>> "initial memory size" in a virtualized environment than adding
>>>> some module parameter to virtio-balloon -- if that makes sense.
>>>>
>>>> MemTotal as is expresses how much memory the buddy currently manages,
>>>> for example, excluding early allocations during boot, excluding actually
>>>> unplugged memory and excluding logically unplugged memory. Adjusting that
>>>> value makes perfect sense for virtio-balloon without deflate-on-oom.
>>>>
>>
>> That's a definition of how MemTotal is implemented, but it's not
>> really a specification of the MemTotal API. The closest thing to a
>> real specification I can find is "Total usable RAM (i.e., physical RAM
>> minus a few reserved bits and the kernel binary code)", from the proc
>> man pages. I think there is quite a bit of leeway in changing how
>> exactly MemTotal is implemented without violating the (quite vague)
>> specification or changing any observable semantics of the API. In
>> particular, leaving the pages in the balloon as part of MemTotal is
>> essentially indistinguishable from simply having a non-OOM killable
>> process locking an equivalent amount of memory. So this proposal isn't
>> really introducing any fundamentally new behavior to the Linux kernel.

How to indicate MemTotal completely depends on the intended semantics.
Using balloon inflation to logically unplug memory vs. some kind of
co-operational memory management with the hypervisor.

For co-operational management I would strongly advise using free page
reporting instead if possible. It can't drain the pagecache so far, but
there are approaches being discussed on how to make that happen (e.g.,
using DAMON, or avoiding the guest page cache using virtio-pmem).

>>
>>>>> It's also not clear to me what utility the extra information would
>>>>> provide to userspace. If userspace wants to know how much memory is
>>>>> available, they should use MemAvailable. If userspace wants to have a
>>>>> rough estimate for the maximum amount of memory in the system, they
>>>>> would add together MemTotal and "logically offline pages". The value
>>>>> of MemTotal with a no-deflate-on-oom virtio-balloon is a value with a
>>>>> vague meaning that lies somewhere between the "maximum amount of
>>>>> memory" and the "current amount of memory". I don't really see any
>>>>> situations where it should clearly be used over one of MemAvailable or
>>>>> MemTotal + "logically offline pages".
>>>>
>>>> The issue is that any application that relies on MemTotal in a virtualized
>>>> environment is most probably already suboptimal in some cases. You can
>>>> rely on it and actually later someone will unplug (inflate balloon)
>>>> memory or plug (deflate balloon) memory. Even MemAvailable is suboptimal
>>>> because what about two applications that rely on that information at
>>>> the same time?
>>>>
>>>
>>> BTW, the general issue here is that "we don't know what the hypervisor
>>> will do".
>>
>> I do agree that this is a significant problem. I would expand on it a
>> bit more, to be "since we don't know what the hypervisor will do, we
>> don't know how to treat memory in the balloon". The proposed module
>> parameter is more or less a mechanism to allow the system
>> administrator to tell the virtio_balloon driver how the hypervisor
>> behaves.
> 
> 
> Now that you put it that way, it looks more like this should
> be a feature bit not a module parameter.

It will be slightly better. At least the hypervisor can indicate the
what it's intending on doing.

>> And if the hypervisor will give memory back to the guest when
>> the guest needs it, then I don't think it's not necessary to logically
>> unplug the memory.
> 
> Ideally we would also pair this with sending a signal to device
> that memory is needed.

Such approaches are in general problematic because once the guest is
already OOM, the hypervisor will most likely not react in time and it's
essentially too late.

So you need some policy somewhere that monitors memory consumption and
makes smart decisions. Usually this is implemented in the hypervisor by
monitoring VM stats.

IMHO the device is the wrong place. I recently discussed something
similar offline with potential virtio-mem users.

> 
>> It might be a bit cleaner to explicitly address this in the
>> virtio_balloon protocol. We could add a min_num_pages field to the
>> balloon config, with semantics along the lines of "The host will
>> respond to memory pressure in the guest by deflating the balloon down
>> to min_num_pages, unless it would cause system instability in the
>> host". Given that feature, I think it would be reasonable to only
>> consider min_num_pages as logically unplugged.
> 
> Okay. I think I would do it a bit differently though, make num_pages be
> the min_num_pages, and add an extra_num_pages field for memory that is
> nice to have but ok to drop.
> 
> 
> As long as we are here, can we add a page_shift field please
> so more than 2^44 bytes can be requested?
> 
> 
>>> Maybe "MemMax" actually could make sense, where we expose the maximum
>>> "MemTotal" we had so far since we were up an running. So the semantics
>>> wouldn't be "maximum possible", because we don't know that, but instead
>>> "maximum we had".
>>
>> Rather than add a new API, I think it is much better to make existing
>> APIs behave closer to how they behave in a non-virtualized
>> environment. It is true that we could go through and fix a limited
>> number of special user space applications, but sysconf(_SC_PHYS_PAGES)
>> and /proc/meminfo are not special APIs. Fixing every application that
>> uses them is not feasible, especially when taking into account systems
>> with closed-source applications (e.g. Android). Also, while MemMax is
>> well defined, it has the same issues you brought up earlier -
>> specifically, applications don't know whether the hypervisor will
>> actually ever provide MemMax again, and they don't know whether MemMax
>> is actually the realy maximum amount of memory that could be available
>> in the future. It's not clear to me that it's significantly better or
>> more useful to userspace than simply changing how MemTotal is
>> implemented.
>>
>> -David
> 
> Agree on trying to avoid changing applications, limiting changes
> to device and guest kernel, this has a lot of value.

With free page reporting in place I barely see a future for such
features, but these are just my 2 cents.

Gluing it to a feature bit like "I, the device, will monitor your memory
consumption and adjust if you're in need of more memory" could be done.

-- 
Thanks,

David / dhildenb

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization



[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux