> On Nov 16, 2021, at 1:17 AM, Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Tue 16-11-21 01:31:44, Alexey Makhalov wrote: > [...] >> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c >> index 6737b1cbf..bbc1a70d5 100644 >> --- a/drivers/acpi/acpi_processor.c >> +++ b/drivers/acpi/acpi_processor.c >> @@ -200,6 +200,10 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) >> * gets online for the first time. >> */ >> pr_info("CPU%d has been hot-added\n", pr->id); >> + { >> + int nid = cpu_to_node(pr->id); >> + printk("%s:%d cpu %d, node %d, online %d, ndata %p\n", __FUNCTION__, __LINE__, pr->id, nid, node_online(nid), NODE_DATA(nid)); >> + } >> pr->flags.need_hotplug_init = 1; > > OK, IIUC you are adding a processor which is outside of > possible_cpu_mask and that means that the node is not allocated for such > a future to be hotplugged cpu and its memory node. init_cpu_to_node > would have done that initialization otherwise. It is not correct. possible_cpus is 128 for this VM. Look at SRAT and percpu output for proof. [ 0.085524] SRAT: PXM 127 -> APIC 0xfe -> Node 127 [ 0.118928] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:128 nr_node_ids:128 It is impossible to add processor outside of possible_cpu_mask. possible_cpus is absolute maximum that system can support. See Documentation/core-api/cpu_hotplug.rst Number of present and onlined CPUs (and nodes) is 4. Other 124 CPUs (and nodes) are not present, but can be potentially hot added. Number of initialized nodes is 4, as init_cpu_to_node() will skip not yet present nodes, see arch/x86/mm/numa.c:798 (numa_cpu_node(CPU #4) == NUMA_NO_NODE) 788 void __init init_cpu_to_node(void) 789 { 790 int cpu; 791 u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid); 792 793 BUG_ON(cpu_to_apicid == NULL); 794 795 for_each_possible_cpu(cpu) { 796 int node = numa_cpu_node(cpu); 797 798 if (node == NUMA_NO_NODE) 799 continue; 800 After CPU (and node) hot plug: - CPU 4 is marker as present, but not yet online - New node got ID 4. numa_cpu_node(CPU #4) returns 4 - node_online(4) == 0 and NODE_DATA(4) == NULL, but it will be accessed inside for_each_possible_cpu loop in percpu allocation. Digging further. Even if x86/CPU hot add maintainers decide to clean up memoryless node hot add code to initialize the node on time of attaching it (to be aligned with mm node while memory hot add), this percpu fix is still needed as it is used during the node onlining, See chicken and egg problem that I described above. Or as 2nd option, numa_cpu_node(4) should return NUMA_NO_NODE until node 4 get fully initialized. Regards, —Alexey
Attachment:
signature.asc
Description: Message signed with OpenPGP