>>>> In add_memory_resource() we hotplug the new node if required and set it >>>> online. Memory might get onlined later, via online_pages(). >>> >>> You are correct. In case of memory hot add, it is true. But in case of adding >>> CPU with memoryless node, try_node_online() will be called only during CPU >>> onlining, see cpu_up(). >>> >>> Is there any reason why try_online_node() resides in cpu_up() and not in add_cpu()? >>> I think it would be correct to online node during the CPU hot add to align with >>> memory hot add. >> >> I am not familiar with cpu hotplug, but this doesn't seem to be anything >> new so how come this became problem only now? > > So IIUC, the issue is that we have a node > > a) That has no memory > b) That is offline > > This node will get onlined when onlining the CPU as Alexey says. Yet we > have some code that stumbles over the node and goes ahead trying to use > the pgdat -- that code is broken. You are correct. > > > If we take a look at build_zonelists() we indeed skip any > !node_online(node). Any other code should do the same. If the node is > not online, it shall be ignored because we might not even have a pgdat > yet -- see hotadd_new_pgdat(). Without node_online(), the pgdat might be > stale or non-existant. Agree, alloc_pages_node() should also do the same. Not exactly to skip the node, but to fallback to another node if !node_online(node). alloc_pages_node() can also be hit while onlining the node, creating chicken-egg problem, see below. > > > The node onlining logic when onlining a CPU sounds bogus as well: Let's > take a look at try_offline_node(). It checks that: > 1) That no memory is *present* > 2) That no CPU is *present* > > We should online the node when adding the CPU ("present"), not when > onlining the CPU. Possible. Assuming try_online_node was moved under add_cpu(), let’s take look on this call stack: add_cpu() try_online_node() __try_online_node() hotadd_new_pgdat() At line 1190 we'll have a problem: 1183 pgdat = NODE_DATA(nid); 1184 if (!pgdat) { 1185 pgdat = arch_alloc_nodedata(nid); 1186 if (!pgdat) 1187 return NULL; 1188 1189 pgdat->per_cpu_nodestats = 1190 alloc_percpu(struct per_cpu_nodestat); 1191 arch_refresh_nodedata(nid, pgdat); alloc_percpu() will go for all possible CPUs and will eventually end up calling alloc_pages_node() trying to use subject nid for corresponding CPU hitting the same state #2 problem as NODE_DATA(nid) is still NULL and nid is not yet online. I like the idea of onlining the node when adding the CPU rather then when CPU get online. It will require current patch or another solution to resolve described above chicken-egg problem. PS, earlier this year I initiated discussion about redesigning per_cpu allocator to do not allocate/waste memory chunks for not present CPUs, but it has another complications. Thanks, —Alexey