Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tejun,

On 2/2/24 18:29, Tejun Heo wrote:
Hello, Helge.

On Fri, Feb 02, 2024 at 09:41:38AM +0100, Helge Deller wrote:
In a second step I extended your patch to print the present
and online CPUs too. Below is the relevant dmesg part.

Note, that on parisc the second CPU will be activated later in the
boot process, after the kernel has the inventory.
This I think differs vs x86, where all CPUs are available earlier
in the boot process.
...
[    0.000000] XXX workqueue_init_early: possible_cpus=ffff  present=0001  online=0001
...
[    0.228080] XXX workqueue_init: possible_cpus=ffff  present=0001  online=0001
...
[    0.263466] XXX workqueue_init_topology: possible_cpus=ffff  present=0001  online=0001

So, what's bothersome is that when the wq_dump.py script printing each cpu's
pwq, it's only printing for CPU 0 and 1. The for_each_possible_cpu() drgn
helper reads cpu_possible_mask from the kernel and iterates that, so that
most likely indicates at some point the cpu_possible_mask becomes 0x3
instead of the one used during boot - 0xffff, which is problematic.

Can you please sprinkle more printks to find out whether and when the
cpu_possible_mask changes during boot?

It seems the commit 0921244f6f4f ("parisc: Only list existing CPUs in cpu_possible_mask")
is the culprit. Reverting that patch makes cpu hot-unplug work again.
Furthermore this commit breaks the cpumask Kunit test as reported by Guenter:
https://lkml.org/lkml/2024/2/4/146

So, I've added the revert to the parisc git tree and if my further tests
go well I'll push it upstream.

Thanks for your help!!
Helge





[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux