Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()

David Hildenbrand <david@xxxxxxxxxx> · Mon, 23 Sep 2019 11:31:30 +0200

On 23.09.19 10:58, Michal Hocko wrote:
> On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
>> On 09.09.19 13:48, David Hildenbrand wrote:
>>> Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
>>>
>>> Let's replace the __online_page...() functions by generic_online_page().
>>> Hyper-V only wants to delay the actual onlining of un-backed pages, so we
>>> can simpy re-use the generic function.
>>>
>>> Only compile-tested.
>>>
>>> Cc: Souptick Joarder <jrdr.linux@xxxxxxxxx>
>>>
>>> David Hildenbrand (3):
>>>   mm/memory_hotplug: Export generic_online_page()
>>>   hv_balloon: Use generic_online_page()
>>>   mm/memory_hotplug: Remove __online_page_free() and
>>>     __online_page_increment_counters()
>>>
>>>  drivers/hv/hv_balloon.c        |  3 +--
>>>  include/linux/memory_hotplug.h |  4 +---
>>>  mm/memory_hotplug.c            | 17 ++---------------
>>>  3 files changed, 4 insertions(+), 20 deletions(-)
>>>
>>
>> Ping, any comments on this one?
> 
> Unification makes a lot of sense to me. You can add
> Acked-by: Michal Hocko <mhocko@xxxxxxxx>
> 
> I will most likely won't surprise if I asked for more here though ;)

I'm not surprised, but definitely not in a negative sense ;) I was
asking myself if we could somehow rework this, too.

> I have to confess I really detest the whole concept of a hidden callback
> with a very weird API. Is this something we can do about? I do realize
> that adding a callback would require either cluttering the existing APIs
> but maybe we can come up with something more clever. Or maybe existing
> external users of online callback can do that as a separate step after
> the online is completed - or is this impossible due to locking
> guarantees?
> 

The use case of this (somewhat special) callback really is to avoid
selected (unbacked in the hypervisor) pages to get put to the buddy just
now, but instead to defer that (sometimes, defer till infinity ;) ).
Especially, to hinder these pages from getting touched at all. Pages
that won't be put to the buddy will usually get PG_offline set (e.g.,
Hyper-V and XEN) - the only two users I am aware of.

For Hyper-V (and also eventually virtio-mem), it is important to set
PG_offline before marking the section to be online (SECTION_IS_ONLINE).
Only this way, PG_offline is properly set on all pfn_to_online_page()
pages, meaning "don't touch this page" - e.g., used to skip over such
pages when suspending or by makedumpfile to skip over such offline pages
when creating a memory dump.

So if we would e.g., try to piggy-back onto the memory_notify()
infrastructure, we could
1. Online all pages to the buddy (dropping the callback)
2. E.g., memory_notify(MEM_ONLINE_PAGES, &arg);
-> in the notifier, pull pages from the buddy, mark sections online
3. Set all involved sections online (online_mem_sections())

However, I am not sure what actually happens after 1. - we are only
holding the device hotplug lock and the memory hotplug lock, so the
pages can just get allocated. Also, it sounds like more work and code
for the same end result (okay, if the rework is really necessary, though).

So yeah, while the current callback might not be optimal, I don't see an
easy and clean way to rework this. With the change in this series we are
at least able to simply defer doing what would have been done without
the callback - not perfect but better.

Do you have anything in mind that could work out and make this nicer?

-- 

Thanks,

David / dhildenb