On Fri, Dec 06, 2024 at 03:07:10PM -0800, Linus Torvalds wrote: > So I'm not sure if this is new or not, but I *think* I would have > noticed it earlier. > > On my Ampere Altra (128-core arm64 system), I started seeing 'top' > claiming a load average of roughly 2.3 even when idle, and it seems to > be all due to this: > > $ ps ax | grep ' [DR] ' > 869 ? D 0:00 [kworker/24:1+usb_hub_wq] > 1900 ? D 0:00 [kworker/24:7+pm] > > where sometimes there are multiple of those 'pm' workers. > > Doing a sysrq-w, I get this: > > task:kworker/24:3 state:D stack:0 pid:1308 tgid:1308 ppid:2 > flags:0x00000008 > Workqueue: pm pm_runtime_work > Call trace: > __switch_to+0xf4/0x168 (T) > __schedule+0x248/0x648 > schedule+0x3c/0xe0 > usleep_range_state+0x118/0x150 > xhci_hub_control+0xe80/0x1090 > rh_call_control+0x274/0x7a0 > usb_hcd_submit_urb+0x13c/0x3a0 > usb_submit_urb+0x1c8/0x600 > usb_start_wait_urb+0x7c/0x180 > usb_control_msg+0xcc/0x150 > usb_port_suspend+0x414/0x510 > usb_generic_driver_suspend+0x68/0x90 > usb_suspend_both+0x1c8/0x290 > usb_runtime_suspend+0x3c/0xb0 > __rpm_callback+0x50/0x1f0 > rpm_callback+0x70/0x88 > rpm_suspend+0xe8/0x5a8 > __pm_runtime_suspend+0x4c/0x130 > usb_runtime_idle+0x48/0x68 > rpm_idle+0xa4/0x358 > pm_runtime_work+0xb0/0xe0 > > task:kworker/24:7 state:D stack:0 pid:1900 tgid:1900 ppid:2 > flags:0x00000208 > Workqueue: pm pm_runtime_work > Call trace: > __switch_to+0xf4/0x168 (T) > __schedule+0x248/0x648 > schedule+0x3c/0xe0 > usleep_range_state+0x118/0x150 > xhci_hub_control+0xe80/0x1090 > rh_call_control+0x274/0x7a0 > usb_hcd_submit_urb+0x13c/0x3a0 > usb_submit_urb+0x1c8/0x600 > usb_start_wait_urb+0x7c/0x180 > usb_control_msg+0xcc/0x150 > usb_port_suspend+0x414/0x510 > usb_generic_driver_suspend+0x68/0x90 > usb_suspend_both+0x1c8/0x290 > usb_runtime_suspend+0x3c/0xb0 > __rpm_callback+0x50/0x1f0 > rpm_callback+0x70/0x88 > rpm_suspend+0xe8/0x5a8 > __pm_runtime_suspend+0x4c/0x130 > > so it seems to be all in that xhci_hub_control() path. I'm not seeing > anything that has changed in the xhci driver in this merge window, so > maybe this goes back further, and I just haven't noticed this odd load > average issue before. > > The call trace for the usb_hub_wq seems a lot less stable, but I've > seen backtraces like > > task:kworker/24:1 state:D stack:0 pid:869 tgid:869 ppid:2 > flags:0x00000008 > Workqueue: usb_hub_wq hub_event > Call trace: > __switch_to+0xf4/0x168 (T) > __schedule+0x248/0x648 > schedule+0x3c/0xe0 > schedule_preempt_disabled+0x2c/0x50 > __mutex_lock.constprop.0+0x478/0x968 > __mutex_lock_slowpath+0x1c/0x38 > mutex_lock+0x6c/0x88 > hub_event+0x144/0x4a0 > process_one_work+0x170/0x408 > worker_thread+0x2cc/0x400 > kthread+0xf4/0x108 > ret_from_fork+0x10/0x20 > > But also just > > Workqueue: usb_hub_wq hub_event > Call trace: > __switch_to+0xf4/0x168 (T) > usb_control_msg+0xcc/0x150 > > or > > Workqueue: usb_hub_wq hub_event > Call trace: > __switch_to+0xf4/0x168 (T) > __schedule+0x248/0x648 > schedule+0x3c/0xe0 > schedule_timeout+0x94/0x120 > msleep+0x30/0x50 > > so at a guess it's just some interaction with that 'pm' workqueue. > > I did a reboot just to verify that yes, it happened again after a > fresh boot. So it is at least *somewhat* consistently repeatable, > although I wouldn't be surprised if it's some kind of timing-dependent > race condition that just happens to trigger on this machine. > > I could try to see if it's so consistent that I could bisect it, but > before I start doing that, maybe just the backtraces makes somebody go > "Ahh, that smells like XYZ".. I can't duplicate this here running on your latest tree at all (it's a "smaller" x86 box with only 64 cores.) So I don't know what to suggest. Are you using any USB devices on this thing, or is that just for internal connections? thanks, greg k-h