+Mario Hi, On 2/2/23 3:56 PM, Usama Arif wrote:
From: David Woodhouse <dwmw@xxxxxxxxxxxx> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx> ---
I'd like to nack this, but can't (and not because it doesn't have commit text): If I: - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1) - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c Then: - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot. - Zen 2,3,4-based servers boot fine - a Zen1-based server doesn't boot. This is what's left on its serial port: [ 3.199633] smp: Bringing up secondary CPUs ... [ 3.200732] x86: Booting SMP configuration: [ 3.204242] .... node #0, CPUs: #1 [ 3.204301] CPU 1 to 93/x86/cpu:kick in 63 21 -114014307645 0 . 0 0 0 0 . 0 114025055970 [ 3.204478] ------------[ cut here ]------------ [ 3.204481] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0 [ 3.204490] Modules linked in: [ 3.204493] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19 [ 3.204496] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018 [ 3.204498] RIP: 0010:cpu_init+0x2d/0x1f0 [ 3.204502] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6 [ 3.204504] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083 [ 3.204506] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008 [ 3.204508] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418 [ 3.204509] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078 [ 3.204510] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000 [ 3.204511] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 3.204512] FS: 0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000 [ 3.204514] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.204515] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0 [ 3.204517] Call Trace: [ 3.204519] ---[ end trace 0000000000000000 ]--- [ 3.204580] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2 [ 3.288686] #2 [ 3.288735] CPU 2 to 93/x86/cpu:kick in 210 42 -114355248756 0 . 0 0 0 0 . 0 114356192013 [ 3.288798] ------------[ cut here ]------------ [ 3.288804] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0 [ 3.288815] Modules linked in: [ 3.288819] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc6+ #19 [ 3.288823] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018 [ 3.288826] RIP: 0010:cpu_init+0x2d/0x1f0 [ 3.288831] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6 [ 3.288835] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083 [ 3.288838] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008 [ 3.288841] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418 [ 3.288844] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078 [ 3.288845] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000 [ 3.288848] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 3.288850] FS: 0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000 [ 3.288852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.288855] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0 [ 3.288857] Call Trace: [ 3.288859] ---[ end trace 0000000000000000 ]--- [ 3.288925] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 8 6.36[ [ 3. 68 33]3 [ #3[ [ # [ 3.368623[ 3 [ 3.368623] #3 [ 3.368662] ------------[ cut here ]------------ [ 3.368673] CPU 3 to 93/x86/cpu:kick in 504 315 -114684508974 0 . 0 0 0 0 . 0 114685353594 [ 3.368705] BUG: scheduling while atomic: swapper/0/1/0x00000003 [ 3.368708] 7 locks held by swapper/0/1: [ 3.368710] #0: ffffffffacbff920 (console_lock){....}-{0:0}, at: vprintk_emit+0x13a/0x2e0 [ 3.368721] #1: ffffffffacbffd48 (console_srcu){....}-{0:0}, at: console_flush_all+0x2d/0x250 [ 3.368728] #2: ffffffffac87f540 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.22+0x189/0x350 [ 3.368735] #3: ffffffffadaae838 (&port_lock_key){....}-{2:2}, at: serial8250_console_write+0x88/0x3c0 [ 3.368745] #4: ffffffffac86aa50 (cpu_add_remove_lock){....}-{3:3}, at: cpu_up+0x6a/0xd0 [ 3.368753] #5: ffffffffac86a9a0 (cpu_hotplug_lock){....}-{0:0}, at: _cpu_up+0x3d/0x2f0 [ 3.368760] #6: ffffffffac8763b0 (smpboot_threads_lock){....}-{3:3}, at: smpboot_create_threads+0x21/0x80 [ 3.368769] Modules linked in: [ 3.368770] Preemption disabled at: [ 3.368771] [<ffffffffaae717a4>] do_cpu_up+0x3e4/0x780 [ 3.368777] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc6+ #19 [ 3.368781] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018 [ 3.368782] Call Trace: [ 3.368783] <TASK> [ 3.368789] dump_stack_lvl+0x49/0x63 [ 3.368795] ? do_cpu_up+0x3e4/0x780 [ 3.368799] dump_stack+0x10/0x16 [ 3.368802] __schedule_bug+0xad/0xd0 [ 3.368808] __schedule+0x76/0x8a0 [ 3.368812] ? sched_clock+0x9/0x10 [ 3.368817] ? sched_clock_local+0x17/0x90 [ 3.368826] ? sort_range+0x30/0x30 [ 3.368830] schedule+0x88/0xd0 [ 3.368833] schedule_timeout+0x40/0x320 [ 3.368840] ? __this_cpu_preempt_check+0x13/0x20 [ 3.368844] ? lock_release+0x353/0x3c0 [ 3.368852] ? sort_range+0x30/0x30 [ 3.368856] wait_for_completion_killable+0xe0/0x1c0 [ 3.368864] __kthread_create_on_node+0xfe/0x1e0 [ 3.368876] ? wait_for_completion_killable+0x38/0x1c0 [ 3.368884] kthread_create_on_node+0x46/0x70 [ 3.368894] kthread_create_on_cpu+0x2c/0x90 [ 3.368899] __smpboot_create_thread+0x87/0x140 [ 3.368905] smpboot_create_threads+0x3f/0x80 [ 3.368909] ? idle_thread_get+0x40/0x40 [ 3.368913] cpuhp_invoke_callback+0x13c/0x5d0 [ 3.368921] __cpuhp_invoke_callback_range+0x69/0xf0 [ 3.368929] _cpu_up+0x12a/0x2f0 [ 3.368937] cpu_up+0x8f/0xd0 [ 3.368942] bringup_nonboot_cpus+0x7c/0x160 [ 3.368950] smp_init+0x2a/0x83 [ 3.368957] kernel_init_freeable+0x1a1/0x309 [ 3.368961] ? lock_release+0x353/0x3c0 [ 3.368972] ? rest_init+0x140/0x140 [ 3.368977] kernel_init+0x1a/0x130 [ 3.368980] ret_from_fork+0x22/0x30 [ 3.368996] </TASK> [ 3.369419] [ 3.369420] .... node #1, CPUs: #4 [ 3.369466] ------------[ cut here ]------------ [ 3.369469] CPU 4 to 93/x86/cpu:kick in 378 42 -114685407543 0 . 0 0 0 0 . 0 114687022569 [ 3.369474] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0 [ 3.369487] Modules linked in: [ 3.369491] ------------[ cut here ]------------ [ 3.369494] DEBUG_LOCKS_WARN_ON(val > preempt_count()) [ 3.369493] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc6+ #19 [ 3.369499] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018 ...which points to the WARN_ON here: static void wait_for_master_cpu(int cpu) { #ifdef CONFIG_SMP /* * wait for ACK from master CPU before continuing * with AP initialization */ WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)); while (!cpumask_test_cpu(cpu, cpu_callout_mask)) cpu_relax(); #endif } Let me know if you'd like me to test any changes. Thanks, Kim