Peter, any thoughts? I did confirm that reverting the commit fixed the issue. On 10/13/2021 10:32 AM, Qian Cai wrote: > > > On 9/29/2021 11:17 AM, Peter Zijlstra wrote: >> --- a/kernel/smp.c >> +++ b/kernel/smp.c >> @@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void) >> { >> int cpu; >> >> - preempt_disable(); >> + cpus_read_lock(); >> for_each_online_cpu(cpu) { >> - if (cpu == smp_processor_id()) >> + if (cpu == raw_smp_processor_id()) >> continue; >> >> wake_up_if_idle(cpu); >> } >> - preempt_enable(); >> + cpus_read_unlock(); > > Peter, it looks like this thing introduced a deadlock during CPU online/offline. > > [ 630.145166][ T129] WARNING: possible recursive locking detected > [ 630.151164][ T129] 5.15.0-rc5-next-20211013+ #145 Not tainted > [ 630.156988][ T129] -------------------------------------------- > [ 630.162984][ T129] cpuhp/21/129 is trying to acquire lock: > [ 630.168547][ T129] ffff800011f466d0 (cpu_hotplug_lock){++++}-{0:0}, at: wake_up_all_idle_cpus+0x40/0xe8 > wake_up_all_idle_cpus at /usr/src/linux-next/kernel/smp.c:1174 > [ 630.178040][ T129] > [ 630.178040][){++++}-{0:0}, at help us debug this: > [ 630.202292][ T129] Possible unsafe locking scenario: > [ 630.202292][ T129] > [ 630.209590][ T129] CPU0 > [ 630.212720][ T129] ---- > [ 630.215851][ T129] lock(cpu_hotplug_lock); > [ 630.220202][ T129] lock(cpu_hotplug_lock); > [ 630.224553][ T129] > [ 630.224553][ T129] *** DEADLOCK *** > [ 630.224553][ T129] > [ 630.232545][ T129] May be due to missing lock nesting notation > [ 630.232545][ T129] > [ 630.240711][ T129] 3 locks held by cpuhp/21/129: > [ 630.245406][ T129] #0: ffff800011f466d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0xe0/0x588 > [ 630.254976][ T129] #1: ffff800011f46780 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0xe0/0x588 > [ 630.264372][ T129] #2: ffff8000191fb9c8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x24/0x38 > [ 630.274031][ T129] > [ 630.274031][ T129] stack backtrace: > [ 630.279767][ T129] CPU: 21 PID: 129 Comm: cpuhp/21 Not tainted 5.15.0-rc5-next-20211013+ #145 > [ 630.288371][ T129] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020 > [ 630.296886][ T129] Call trace: > [ 630.300017][ T129] dump_backtrace+0x0/0x3b8 > [ 630.304369][ T129] show_stack+0x20/0x30 > [ 630.308371][ T129] dump_stack_lvl+0x8c/0xb8 > [ 630.312722][ T129] dump_stack+0x1c/0x38 > [ 630.316723][ T129] validate_chain+0x1d84/0x1da0 > [ 630.321421][ T129] __lock_acquire+0xab0/0x2040 > [ 630.326033][ T129] lock_acquire+0x32c/0xb08 > [ 630.330390][ T129] cpus_read_lock+0x94/0x308 > [ 630.334827][ T129] wake_up_all_idle_cpus+0x40/0xe8 > [ 630.339784][ T129] cpuidle_uninstall_idle_handler+0x3c/0x50 > [ 630.345524][ T129] cpuidle_pause_and_lock+0x28/0x38 > [ 630.350569][ T129] acpi_processor_hotplug+0xc0/0x170 > [ 630.355701][ T129] acpi_soft_cpu_online+0x124/0x250 > [ 630.360745][ T129] cpuhp_invoke_callback+0x51c/0x2ab8 > [ 630.365963][ T129] cpuhp_thread_fun+0x204/0x588 > [ 630.370659][ T129] smpboot_thread_fn+0x3f0/0xc40 > [ 630.375444][ T129] kthread+0x3d8/0x488 > [ 630.379360][ T129] ret_from_fork+0x10/0x20 > [ 863.525716][ T191] INFO: task cpuhp/21:129 blocked for more than 122 seconds. > [ 863.532954][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145 > [ 863.539361][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 863.547927][ T191] task:cpuhp/21 state:D stack:59104 pid: 129 ppid: 2 flags:0x00000008 > [ 863.557029][ T191] Call trace: > [ 863.560171][ T191] __switch_to+0x184/0x400 > [ 863.564448][ T191] __schedule+0x74c/0x1940 > [ 863.568753][ T191] schedule+0x110/0x318 > [ 863.572764][ T191] percpu_rwsem_wait+0x1b8/0x348 > [ 863.577592][ T191] __percpu_down_read+0xb0/0x148 > [ 863.582386][ T191] cpus_read_lock+0x2b0/0x308 > [ 863.586961][ T191] wake_up_all_idle_cpus+0x40/0xe8 > [ 863.591931][ T191] cpuidle_uninstall_idle_handler+0x3c/0x50 > [ 863.597716][ T191] cpuidle_pause_and_lock+0x28/0x38 > [ 863.602771][ T191] acpi_processor_hotplug+0xc0/0x170 > [ 863.607946][ T191] acpi_soft_cpu_online+0x124/0x250 > [ 863.613001][ T191] cpuhp_invoke_callback+0x51c/0x2ab8 > [ 863.618261][ T191] cpuhp_thread_fun+0x204/0x588 > [ 863.622967][ T191] smpboot_thread_fn+0x3f0/0xc40 > [ 863.627787][ T191] kthread+0x3d8/0x488 > [ 863.631712][ T191] ret_from_fork+0x10/0x20 > [ 863.636020][ T191] INFO: task kworker/0:2:189 blocked for more than 122 seconds. > [ 863.643500][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145 > [ 863.649882][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 863.658425][ T191] task:kworker/0:2 state:D stack:58368 pid: 189 ppid: 2 flags:0x00000008 > [ 863.667516][ T191] Workqueue: events vmstat_shepherd > [ 863.672573][ T191] Call trace: > [ 863.675731][ T191] __switch_to+0x184/0x400 > [ 863.680001][ T191] __schedule+0x74c/0x1940 > [ 863.684268][ T191] schedule+0x110/0x318 > [ 863.688295][ T191] percpu_rwsem_wait+0x1b8/0x348 > [ 863.693085][ T191] __percpu_down_read+0xb0/0x148 > [ 863.697892][ T191] cpus_read_lock+0x2b0/0x308 > [ 863.702421][ T191] vmstat_shepherd+0x5c/0x1a8 > [ 863.706977][ T191] process_one_work+0x808/0x19d0 > [ 863.711767][ T191] worker_thread+0x334/0xae0 > [ 863.716227][ T191] kthread+0x3d8/0x488 > [ 863.720149][ T191] ret_from_fork+0x10/0x20 > [ 863.724487][ T191] INFO: task lsbug:4642 blocked for more than 123 seconds. > [ 863.731565][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145 > [ 863.737938][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 863.746490][ T191] task:lsbug state:D stack:55536 pid: 4642 ppid: 4638 flags:0x00000008 > [ 863.755549][ T191] Call trace: > [ 863.758712][ T191] __switch_to+0x184/0x400 > [ 863.762984][ T191] __schedule+0x74c/0x1940 > [ 863.767286][ T191] schedule+0x110/0x318 > [ 863.771294][ T191] schedule_timeout+0x188/0x238 > [ 863.776016][ T191] wait_for_completion+0x174/0x290 > [ 863.780979][ T191] __cpuhp_kick_ap+0x158/0x1a8 > [ 863.785592][ T191] cpuhp_kick_ap+0x1f0/0x828 > [ 863.790053][ T191] bringup_cpu+0x180/0x1e0 > [ 863.794320][ T191] cpuhp_invoke_callback+0x51c/0x2ab8 > [ 863.799561][ T191] cpuhp_invoke_callback_range+0xa4/0x108 > [ 863.805130][ T191] cpu_up+0x528/0xd78 > [ 863.808982][ T191] cpu_device_up+0x4c/0x68 > [ 863.813249][ T191] cpu_subsys_online+0xc0/0x1f8 > [ 863.817972][ T191] device_online+0x10c/0x180 > [ 863.822413][ T191] online_store+0x10c/0x118 > [ 863.826791][ T191] dev_attr_store+0x44/0x78 > [ 863.831148][ T191] sysfs_kf_write+0xe8/0x138 > [ 863.835590][ T191] kernfs_fop_write_iter+0x26c/0x3d0 > [ 863.840745][ T191] new_sync_write+0x2bc/0x4f8 > [ 863.845275][ T191] vfs_write+0x714/0xcd8 > [ 863.849387][ T191] ksys_write+0xf8/0x1e0 > [ 863.853481][ T191] __arm64_sys_write+0x74/0xa8 > [ 863.858113][ T191] invoke_syscall.constprop.0+0xdc/0x1d8 > [ 863.863597][ T191] do_el0_svc+0xe4/0x298 > [ 863.867710][ T191] el0_svc+0x64/0x130 > [ 863.871545][ T191] el0t_64_sync_handler+0xb0/0xb8 > [ 863.876437][ T191] el0t_64_sync+0x180/0x184 > [ 863.880798][ T191] INFO: task mount:4682 blocked for more than 123 seconds. > [ 863.887881][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145 > [ 863.894232][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 863.902776][ T191] task:mount state:D stack:55856 pid: 4682 ppid: 1101 flags:0x00000000 > [ 863.911865][ T191] Call trace: > [ 863.915003][ T191] __switch_to+0x184/0x400 > [ 863.919296][ T191] __schedule+0x74c/0x1940 > [ 863.923564][ T191] schedule+0x110/0x318 > [ 863.927590][ T191] percpu_rwsem_wait+0x1b8/0x348 > [ 863.932380][ T191] __percpu_down_read+0xb0/0x148 > [ 863.937187][ T191] cpus_read_lock+0x2b0/0x308 > [ 863.941715][ T191] alloc_workqueue+0x730/0xd48 > [ 863.946357][ T191] loop_configure+0x2d4/0x1180 [loop] > [ 863.951592][ T191] lo_ioctl+0x5dc/0x1228 [loop] > [ 863.956321][ T191] blkdev_ioctl+0x258/0x820 > [ 863.960678][ T191] __arm64_sys_ioctl+0x114/0x180 > [ 863.965468][ T191] invoke_syscall.constprop.0+0xdc/0x1d8 > [ 863.970974][ T191] do_el0_svc+0xe4/0x298 > [ 863.975069][ T191] el0_svc+0x64/0x130 > [ 863.978922][ T191] el0t_64_sync_handler+0xb0/0xb8 > [ 863.983798][ T191] el0t_64_sync+0x180/0x184 > [ 863.988172][ T191] INFO: lockdep is turned off. >