On Tue 16-11-21 20:22:49, Alexey Makhalov wrote: > > > > On Nov 16, 2021, at 1:17 AM, Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Tue 16-11-21 01:31:44, Alexey Makhalov wrote: > > [...] > >> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > >> index 6737b1cbf..bbc1a70d5 100644 > >> --- a/drivers/acpi/acpi_processor.c > >> +++ b/drivers/acpi/acpi_processor.c > >> @@ -200,6 +200,10 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > >> * gets online for the first time. > >> */ > >> pr_info("CPU%d has been hot-added\n", pr->id); > >> + { > >> + int nid = cpu_to_node(pr->id); > >> + printk("%s:%d cpu %d, node %d, online %d, ndata %p\n", __FUNCTION__, __LINE__, pr->id, nid, node_online(nid), NODE_DATA(nid)); > >> + } > >> pr->flags.need_hotplug_init = 1; > > > > OK, IIUC you are adding a processor which is outside of > > possible_cpu_mask and that means that the node is not allocated for such > > a future to be hotplugged cpu and its memory node. init_cpu_to_node > > would have done that initialization otherwise. > It is not correct. > > possible_cpus is 128 for this VM. Look at SRAT and percpu output for proof. > [ 0.085524] SRAT: PXM 127 -> APIC 0xfe -> Node 127 > [ 0.118928] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:128 nr_node_ids:128 OK, I see. I have missed that when looking at the boot log you have sent. > It is impossible to add processor outside of possible_cpu_mask. possible_cpus is absolute maximum > that system can support. See Documentation/core-api/cpu_hotplug.rst That was my understanding hence the suspicion you might be doing something that is not really supported. > Number of present and onlined CPUs (and nodes) is 4. Other 124 CPUs (and nodes) are not present, but can > be potentially hot added. Yes this is a configuration I have already seen. The cpu->node binding was configured during the boot time though IIRC. > Number of initialized nodes is 4, as init_cpu_to_node() will skip not yet present nodes, > see arch/x86/mm/numa.c:798 (numa_cpu_node(CPU #4) == NUMA_NO_NODE) Isn't this the problem? Why is the cpu->node association missing here? > 788 void __init init_cpu_to_node(void) > 789 { > 790 int cpu; > 791 u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid); > 792 > 793 BUG_ON(cpu_to_apicid == NULL); > 794 > 795 for_each_possible_cpu(cpu) { > 796 int node = numa_cpu_node(cpu); > 797 > 798 if (node == NUMA_NO_NODE) > 799 continue; > 800 > > After CPU (and node) hot plug: > - CPU 4 is marker as present, but not yet online > - New node got ID 4. numa_cpu_node(CPU #4) returns 4 > - node_online(4) == 0 and NODE_DATA(4) == NULL, but it will be accessed inside > for_each_possible_cpu loop in percpu allocation. > > Digging further. > Even if x86/CPU hot add maintainers decide to clean up memoryless node hot add code to initialize the node on time of > attaching it (to be aligned with mm node while memory hot add), this percpu fix is still needed as it is used during > the node onlining, See chicken and egg problem that I described above. I have to say I do not see the chicken and egg problem. As long as init_cpu_to_node initializes the memoryless node for the cpu properly then the pcp allocator doesn't really have to care as the page allocator falls back to to first populated node in a distance order. So I believe the whole issue boils down to addressing why init_cpu_to_node doesn't see a proper cpu->node association. > Or as 2nd option, numa_cpu_node(4) should return NUMA_NO_NODE until node 4 get fully initialized. -- Michal Hocko SUSE Labs