On 25.03.21 11:55, Oscar Salvador wrote:
On Thu, Mar 25, 2021 at 10:17:33AM +0100, Michal Hocko wrote:
Why do you think it is wrong to initialize/account pages when they are
used? Keep in mind that offline pages are not used until they are
onlined. But vmemmap pages are used since the vmemmap is established
which happens in the hotadd stage.
Yes, that is true.
vmemmap pages are used right when we populate the vmemmap space.
Note: I once herd of a corner-case use case where people offline memory
blocks to then use the "free" memory via /dev/mem for other purposes
("large physical memory"). Not that I encourage such use cases, but they
would be fundamentally broken if the vmemmap ends up on offline memory
and is supposed to keep its state ...
plus the fact that I dislike to place those pages in
ZONE_NORMAL, although they are not movable.
But I think the vmemmap pages should lay within the same zone the pages
they describe, doing so simplifies things, and I do not see any outright
downside.
Well, both ways likely have its pros and cons. Nevertheless, if the
vmemmap storage is independent (which is the case for normal hotplug)
then the state is consistent over hotadd, {online, offline} N times,
hotremove cycles. Which is conceptually reasonable as vmemmap doesn't
go away on each offline.
If you are going to bind accounting to the online/offline stages then
the accounting changes each time you go through the cycle and depending
on the onlining type it would travel among zones. I find it quite
confusing as the storage for vmemmap hasn't changed any of its
properties.
That is a good point I guess.
vmemmap pages do not really go away until the memory is unplugged.
But I see some questions to raise:
- As I said, I really dislike it tiding vmemmap memory to ZONE_NORMAL
unconditionally and this might result in the problems David mentioned.
I remember David and I discussed such problems but the problems with
zones not being contiguos have also been discussed in the past and
IIRC, we reached the conclusion that a maximal effort should be made
to keep them that way, otherwise other things suffer e.g: compaction
code.
So if we really want to move the initialization/account to the
hot-add/hot-remove stage, I would really like to be able to set the
proper zone in there (that is, the same zone where the memory will lay).
Determining the zone when hot-adding does not make too much sense: you
don't know what user space might end up deciding (online_kernel,
online_movable...).
- When moving the initialization/accounting to hot-add/hot-remove,
the section containing the vmemmap pages will remain offline.
It might get onlined once the pages get online in online_pages(),
or not if vmemmap pages span a whole section.
I remember (but maybe David rmemeber better) that that was a problem
wrt. pfn_to_online_page() and hybernation/kdump.
So, if that is really a problem, we would have to care of ot setting
the section to the right state.
Good memory. Indeed, hibernation/kdump won't save the state of the
vmemmap, because the memory is marked as offline and, thus, logically
without any valuable content.
- AFAICS, doing all the above brings us to former times were some
initialization/accounting was done in a previous stage, and I remember
it was pushed hard to move those in online/offline_pages().
Are we ok with that?
As I said, we might have to set the right zone in hot-add stage, as
otherwise problems might come up.
Being that case, would not that also be conflating different concepts
at a wrong phases?
I expressed my opinion already, no need to repeat. Sub-section online
maps would make it cleaner, but I am still not convinced we want/need that.
Do not take me wrong, I quite like Michal's idea, and from a
conceptually point of view I guess it is the right thing to do.
But when evualating risks/difficulty, I am not really sure.
If we can pull that off while setting the right zone (and must be seen
what about the section state), and the outcome is not ugly, I am all for
it.
Also a middel-ground might be something like I previously mentioned(having
a helper in memory_block_action() to do the right thing, so
offline/online_pages() do not get pouled.
As I said, having soemthing like
memory_block_online()/memory_block_offline() could be one way to tackle
it. We only support onlining/offlining of memory blocks and I ripped out
all code that was abusing online_pages/offline_pages ...
So have memory_block_online() call online_pages() and do the accounting
of the vmemmap, with a big fat comment that sections are actually set
online/offline in online_pages/offline_pages(). Could be a simple
cleanup on top of this series ...
--
Thanks,
David / dhildenb