On Thu 25-03-21 09:07:03, Oscar Salvador wrote: > On Wed, Mar 24, 2021 at 08:16:53PM +0100, David Hildenbrand wrote: > > > > 1. If the underlying memory block is offline, all sections are offline. Zone > > > > shrinking code will happily skip over the vmemmap pages and you can end up > > > > with out-of-zone pages assigned to the zone. Can happen in corner cases. > > > > > > You are right. But do we really care? Those pages should be of no > > > interest to anybody iterating through zones/nodes anyway. > > > > Well, we were just discussing getting zone/node links + span right for all > > pages (including for special reserved pages), because it already resulted in > > BUGs. So I am not convinced that we *don't* have to care. > > > > However, I agree that most code that cares about node/zone spans shouldn't > > care - e.g., never call set_pfnblock_flags_mask() on such blocks. > > > > But I guess there are corner cases where we would end up with > > zone_is_empty() == true, not sure what that effect would be ... at least the > > node cannot vanish as we disallow offlining it while we have a memory block > > linked to it. > > Having quickly looked at Michal's change, I have to say that it does not > look that bad, but I think it is doing the initialization/accounting at > the wrong stage, Why do you think it is wrong to initialize/account pages when they are used? Keep in mind that offline pages are not used until they are onlined. But vmemmap pages are used since the vmemmap is established which happens in the hotadd stage. > plus the fact that I dislike to place those pages in > ZONE_NORMAL, although they are not movable. > But I think the vmemmap pages should lay within the same zone the pages > they describe, doing so simplifies things, and I do not see any outright > downside. Well, both ways likely have its pros and cons. Nevertheless, if the vmemmap storage is independent (which is the case for normal hotplug) then the state is consistent over hotadd, {online, offline} N times, hotremove cycles. Which is conceptually reasonable as vmemmap doesn't go away on each offline. If you are going to bind accounting to the online/offline stages then the accounting changes each time you go through the cycle and depending on the onlining type it would travel among zones. I find it quite confusing as the storage for vmemmap hasn't changed any of its properties. [...] > This is just an idea I did not get to think carefully, but what if we > do it in helpers right before calling online_pages()/offline_pages() > in memory_block_action() ? That would result in a less confusing code in {on,off}lining code operating on two sets of pfns, nr_pages. But fundamentally I still consider it a suboptimal to have accounting which is detached from the life cycle. If we really want to go that path we should have a very good reason for that. -- Michal Hocko SUSE Labs