Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/31/24 23:28, Tejun Heo wrote:
On Wed, Jan 31, 2024 at 08:27:45PM +0100, Helge Deller wrote:
When hot-unplugging a 32-bit CPU on the parisc platform with
"chcpu -d 1", I get the following kernel panic. Adding a check
for !pwq prevents the panic.

  Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000
  CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.8.0-rc1-32bit+ #1291
  Hardware name: 9000/778/B160L

  IASQ: 00000000 00000000 IAOQ: 10446db4 10446db8
   IIR: 0f80109c    ISR: 00000000  IOR: 00000000
   CPU:        1   CR30: 11dd1710 CR31: 00000000
   IAOQ[0]: wq_update_pod+0x98/0x14c
   IAOQ[1]: wq_update_pod+0x9c/0x14c
   RP(r2): wq_update_pod+0x80/0x14c
  Backtrace:
   [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
   [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
   [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
   [<10452970>] smpboot_thread_fn+0x284/0x288
   [<1044d8f4>] kthread+0x12c/0x13c
   [<1040201c>] ret_from_kernel_thread+0x1c/0x24
  Kernel panic - not syncing: Kernel Fault

Signed-off-by: Helge Deller <deller@xxxxxx>

---

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 76e60faed892..dfeee7b7322c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4521,6 +4521,8 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu,
  	wq_calc_pod_cpumask(target_attrs, cpu, off_cpu);
  	pwq = rcu_dereference_protected(*per_cpu_ptr(wq->cpu_pwq, cpu),
  					lockdep_is_held(&wq_pool_mutex));
+	if (!pwq)
+		return;

Hmm... I have a hard time imagining a scenario where some CPUs don't have
pwq installed on wq->cpu_pwq. Can you please run `drgn
tools/workqueue/wq_dump.py` before triggering the hotplug event and paste
the output along with full dmesg?

I'm not sure if parisc is already fully supported with that tool, or
if I'm doing something wrong:

root@debian:~# uname -a
Linux debian 6.8.0-rc1-32bit+ #1292 SMP PREEMPT Thu Feb  1 11:31:38 CET 2024 parisc GNU/Linux

root@debian:~# drgn --main-symbols -s ./vmlinux ./wq_dump.py
Traceback (most recent call last):
  File "/usr/bin/drgn", line 33, in <module>
    sys.exit(load_entry_point('drgn==0.0.25', 'console_scripts', 'drgn')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/drgn/cli.py", line 301, in _main
    runpy.run_path(script, init_globals={"prog": prog}, run_name="__main__")
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "./wq_dump.py", line 78, in <module>
    worker_pool_idr         = prog['worker_pool_idr']
                              ~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'worker_pool_idr'

Maybe you have an idea? I'll check further, but otherwise it's probably
easier for me to add some printk() to the kernel function wq_update_pod()
and send that info?

Helge





[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux