Re: [6.13-rc0 regression] workqueue throwing cpu affinity warnings during CPU hotplug

Tejun Heo <tj@xxxxxxxxxx> · Wed, 11 Dec 2024 12:57:26 -1000

Hello, Dave.

Sorry about the really late reply.

On Fri, Nov 22, 2024 at 11:38:19AM +1100, Dave Chinner wrote:
> Hi Tejun,
> 
> I just upgraded my test VMs from 6.12.0 to a current TOT kernel and
> I got several of these warnings whilst running fstests whilst
> running CPU hotplug online/offline concurrently with various tests:
> 
> [ 2508.109594] ------------[ cut here ]------------
> [ 2508.115669] WARNING: CPU: 23 PID: 133 at kernel/kthread.c:76 kthread_set_per_cpu+0x33/0x50
...
> [ 2508.253909]  <TASK>
> [ 2508.311972]  unbind_worker+0x1b/0x70
> [ 2508.315444]  workqueue_offline_cpu+0xd8/0x1f0
> [ 2508.319554]  cpuhp_invoke_callback+0x13e/0x4f0
> [ 2508.328936]  cpuhp_thread_fun+0xda/0x120
> [ 2508.332746]  smpboot_thread_fn+0x132/0x1d0
> [ 2508.336645]  kthread+0x147/0x170
> [ 2508.347646]  ret_from_fork+0x3e/0x50
> [ 2508.353845]  ret_from_fork_asm+0x1a/0x30
> [ 2508.357773]  </TASK>
> [ 2508.357776] ---[ end trace 0000000000000000 ]---

So, this is kthread saying that the thread passed to it doesn't have
PF_KTHREAD set. There hasn't been any related changes and the flag is never
cleared once set, so I don't see how that could be for a kworker.

> I have also seen similar traces from the CPUs coming on-line:
> 
> [ 2535.818771] WARNING: CPU: 23 PID: 133 at kernel/kthread.c:76 kthread_set_per_cpu+0x33/0x50
> ....
> [ 2535.969004] RIP: 0010:kthread_set_per_cpu+0x33/0x50
> ....
> [ 2508.249599] Call Trace:
> [ 2508.253909]  <TASK>
> [ 2535.969029]  workqueue_online_cpu+0xe6/0x2f0
> [ 2535.969032]  cpuhp_invoke_callback+0x13e/0x4f0
> [ 2535.969044]  cpuhp_thread_fun+0xda/0x120
> [ 2535.969047]  smpboot_thread_fn+0x132/0x1d0
> [ 2535.969053]  kthread+0x147/0x170
> [ 2535.969066]  ret_from_fork+0x3e/0x50
> [ 2535.969076]  ret_from_fork_asm+0x1a/0x30
> [ 2508.357773]  </TASK>

Yeah, this is the same.

> I didn't see these on 6.12.0, so I'm guessing that there is
> something in the merge window that has started triggering this.

I tried a few mixtures of stress-ng + continuous hot [un]plugging but can't
reproduce in the current linus#master. Do you still see this happening?

Thanks.

-- 
tejun