Hi, I just found another problem. When passing "mem=256" to 2.6.37-rc1, it dies hard early (not able to print any boot log). With this patch applied, it's a bit better: it shows a kernel panic, but still dies hard (not able to reboot with "panic=10"). Attached is the screenshot in kvm (it's not specific to kvm, it dies hard on two more physical boxes). The screenshot shows that it panics inside reserve_trampoline_memory(). Thanks, Fengguang On Sun, Nov 14, 2010 at 09:38:41AM +0800, Yinghai Lu wrote: > > Recent Intel new system have different order in MADT, aka will list all thread0 > at first, then all thread1. > But SRAT table still old order, it will list cpus in one socket all together. > > If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed > to put some cpus apic id to node mapping into apicid_to_node[]. > > for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash... > > [ 9.106288] Total of 32 processors activated (136190.88 BogoMIPS). > [ 9.235021] divide error: 0000 [#1] SMP > [ 9.235315] last sysfs file: > [ 9.235481] CPU 1 > [ 9.235592] Modules linked in: > [ 9.245398] > [ 9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274 /Sun Fire x4800 > [ 9.265415] RIP: 0010:[<ffffffff81075a8f>] [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623 > [ 9.265835] RSP: 0000:ffff88103f8d1c40 EFLAGS: 00010046 > [ 9.285550] RAX: 0000000000000000 RBX: ffff88103f887de0 RCX: 0000000000000000 > [ 9.305356] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000200 > [ 9.305711] RBP: ffff88103f8d1d10 R08: 0000000000000200 R09: ffff88103f887e38 > [ 9.325709] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 > [ 9.326038] R13: ffff88107e80dfb0 R14: 0000000000000001 R15: ffff88103f887e40 > [ 9.345655] FS: 0000000000000000(0000) GS:ffff88107e800000(0000) knlGS:0000000000000000 > [ 9.365503] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 9.365776] CR2: 0000000000000000 CR3: 0000000002417000 CR4: 00000000000006e0 > [ 9.385583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 9.405368] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 9.405713] Process kthreadd (pid: 2, threadinfo ffff88103f8d0000, task ffff88305c8aa2d0) > [ 9.425563] Stack: > [ 9.425668] ffff88103f8d1cb0 0000000000000046 0000000000000000 0000000200000000 > [ 9.445509] 0000000000000000 0000000100000000 0000000000000046 ffffffff82bd1ce0 > [ 9.465350] 000000015c8aa2d0 00000000001d2540 00000000001d2540 0000007d3f8d1d28 > [ 9.465763] Call Trace: > [ 9.465875] [<ffffffff810747c3>] wake_up_new_task+0x3c/0x10e > [ 9.485486] [<ffffffff8107b2e3>] do_fork+0x28c/0x35f > [ 9.485753] [<ffffffff810ab832>] ? __lock_acquire+0x1801/0x1813 > [ 9.505474] [<ffffffff8106f2bd>] ? finish_task_switch+0x80/0xf4 > [ 9.525264] [<ffffffff8106f286>] ? finish_task_switch+0x49/0xf4 > [ 9.525575] [<ffffffff8109da72>] ? local_clock+0x2b/0x3c > [ 9.545281] [<ffffffff8103da76>] kernel_thread+0x70/0x72 > [ 9.545544] [<ffffffff81097c83>] ? kthread+0x0/0xa8 > [ 9.545797] [<ffffffff81037990>] ? kernel_thread_helper+0x0/0x10 > [ 9.565519] [<ffffffff81098099>] kthreadd+0xe8/0x12b > [ 9.585185] [<ffffffff81037994>] kernel_thread_helper+0x4/0x10 > [ 9.585485] [<ffffffff81cd793c>] ? restore_args+0x0/0x30 > [ 9.605192] [<ffffffff81097fb1>] ? kthreadd+0x0/0x12b > [ 9.605479] [<ffffffff81037990>] ? kernel_thread_helper+0x0/0x10 > [ 9.625295] Code: a0 be 00 02 00 00 ff c2 48 63 d2 e8 f8 67 3b 00 3b 05 86 8e 52 01 48 89 c7 89 45 c8 7c c1 48 8b 45 b0 8b 4b 08 31 d2 48 c1 e0 0a <48> f7 f1 45 85 e4 75 08 48 3b 45 b8 72 08 eb 0d 48 89 45 a8 eb > [ 9.645938] RIP [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623 > [ 9.665356] RSP <ffff88103f8d1c40> > [ 9.665568] ---[ end trace 2296156d35fdfc87 ]--- > > So let just parse all cpu entries in SRAT. > > Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of > apicid_to_node[]. > > it should fix following bug too. > https://bugzilla.kernel.org/show_bug.cgi?id=22662 > > Reported-and-Tested-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> > Reported-by: Bjorn Helgaas <bjorn.helgaas@xxxxxx> > Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx> > > --- > arch/x86/kernel/acpi/boot.c | 7 +++++++ > arch/x86/mm/srat_64.c | 8 ++++++++ > drivers/acpi/numa.c | 14 ++++++++++++-- > 3 files changed, 27 insertions(+), 2 deletions(-) > > Index: linux-2.6/arch/x86/kernel/acpi/boot.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/acpi/boot.c > +++ linux-2.6/arch/x86/kernel/acpi/boot.c > @@ -198,6 +198,13 @@ static void __cpuinit acpi_register_lapi > { > unsigned int ver = 0; > > +#ifdef CONFIG_X86_64 > + if (id >= (MAX_APICS-1)) { > + printk(KERN_INFO PREFIX "skipped apicid that is too big\n"); > + return; > + } > +#endif > + > if (!enabled) { > ++disabled_cpus; > return; > Index: linux-2.6/arch/x86/mm/srat_64.c > =================================================================== > --- linux-2.6.orig/arch/x86/mm/srat_64.c > +++ linux-2.6/arch/x86/mm/srat_64.c > @@ -134,6 +134,10 @@ acpi_numa_x2apic_affinity_init(struct ac > } > > apic_id = pa->apic_id; > + if (apic_id >= MAX_LOCAL_APIC) { > + printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped that apicid too big\n", pxm, apic_id, node); > + return; > + } > apicid_to_node[apic_id] = node; > node_set(node, cpu_nodes_parsed); > acpi_numa = 1; > @@ -168,6 +172,10 @@ acpi_numa_processor_affinity_init(struct > apic_id = (pa->apic_id << 8) | pa->local_sapic_eid; > else > apic_id = pa->apic_id; > + if (apic_id >= MAX_LOCAL_APIC) { > + printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node); > + return; > + } > apicid_to_node[apic_id] = node; > node_set(node, cpu_nodes_parsed); > acpi_numa = 1; > Index: linux-2.6/drivers/acpi/numa.c > =================================================================== > --- linux-2.6.orig/drivers/acpi/numa.c > +++ linux-2.6/drivers/acpi/numa.c > @@ -275,13 +275,23 @@ acpi_table_parse_srat(enum acpi_srat_typ > int __init acpi_numa_init(void) > { > int ret = 0; > + int nr_cpu_entries = nr_cpu_ids; > + > +#ifdef CONFIG_X86_64 > + /* > + * Should not limit number with cpu num that will handle, > + * SRAT cpu entries could have different order with that in MADT. > + * So go over all cpu entries in SRAT to get apicid to node mapping. > + */ > + nr_cpu_entries = MAX_LOCAL_APIC; > +#endif > > /* SRAT: Static Resource Affinity Table */ > if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) { > acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY, > - acpi_parse_x2apic_affinity, nr_cpu_ids); > + acpi_parse_x2apic_affinity, nr_cpu_entries); > acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY, > - acpi_parse_processor_affinity, nr_cpu_ids); > + acpi_parse_processor_affinity, nr_cpu_entries); > ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY, > acpi_parse_memory_affinity, > NR_NODE_MEMBLKS);
Attachment:
panic-reserve_trampoline_memory.png
Description: PNG image