On 02.11.21 11:34, Alexey Makhalov wrote: > >>>>> In add_memory_resource() we hotplug the new node if required and set it >>>>> online. Memory might get onlined later, via online_pages(). >>>> >>>> You are correct. In case of memory hot add, it is true. But in case of adding >>>> CPU with memoryless node, try_node_online() will be called only during CPU >>>> onlining, see cpu_up(). >>>> >>>> Is there any reason why try_online_node() resides in cpu_up() and not in add_cpu()? >>>> I think it would be correct to online node during the CPU hot add to align with >>>> memory hot add. >>> >>> I am not familiar with cpu hotplug, but this doesn't seem to be anything >>> new so how come this became problem only now? >> >> So IIUC, the issue is that we have a node >> >> a) That has no memory >> b) That is offline >> >> This node will get onlined when onlining the CPU as Alexey says. Yet we >> have some code that stumbles over the node and goes ahead trying to use >> the pgdat -- that code is broken. > > You are correct. > >> >> >> If we take a look at build_zonelists() we indeed skip any >> !node_online(node). Any other code should do the same. If the node is >> not online, it shall be ignored because we might not even have a pgdat >> yet -- see hotadd_new_pgdat(). Without node_online(), the pgdat might be >> stale or non-existant. > > Agree, alloc_pages_node() should also do the same. Not exactly to skip the node, > but to fallback to another node if !node_online(node). > alloc_pages_node() can also be hit while onlining the node, creating chicken-egg > problem, see below Right, the issue is also a bit involved when calling alloc_pages_node() on an offline NID. See below. > >> >> >> The node onlining logic when onlining a CPU sounds bogus as well: Let's >> take a look at try_offline_node(). It checks that: >> 1) That no memory is *present* >> 2) That no CPU is *present* >> >> We should online the node when adding the CPU ("present"), not when >> onlining the CPU. > > Possible. > Assuming try_online_node was moved under add_cpu(), let’s > take look on this call stack: > add_cpu() > try_online_node() > __try_online_node() > hotadd_new_pgdat() > At line 1190 we'll have a problem: > 1183 pgdat = NODE_DATA(nid); > 1184 if (!pgdat) { > 1185 pgdat = arch_alloc_nodedata(nid); > 1186 if (!pgdat) > 1187 return NULL; > 1188 > 1189 pgdat->per_cpu_nodestats = > 1190 alloc_percpu(struct per_cpu_nodestat); > 1191 arch_refresh_nodedata(nid, pgdat); > > alloc_percpu() will go for all possible CPUs and will eventually end up > calling alloc_pages_node() trying to use subject nid for corresponding CPU > hitting the same state #2 problem as NODE_DATA(nid) is still NULL and nid > is not yet online. Right, we will end up calling pcpu_alloc_pages()->alloc_pages_node() for each possible CPU. We use cpu_to_node() to come up with the NID. I can only assume that we usually don't get an offline NID for an offline CPU, but instead either NODE=0 or NODE=NUMA_NO_NODE, because ... alloc_pages_node()->__alloc_pages_node() will: VM_WARN_ON((gfp_mask & __GFP_THISNODE) && !node_online(nid)); BUT: prepare_alloc_pages() ac->zonelist = node_zonelist(preferred_nid, gfp_mask); should similarly fail. when de-referencing NULL. -- Thanks, David / dhildenb