USB: workqueues stuck in 'D' state?

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 6 Dec 2024 15:07:10 -0800

So I'm not sure if this is new or not, but I *think* I would have
noticed it earlier.

On my Ampere Altra (128-core arm64 system), I started seeing 'top'
claiming a load average of roughly 2.3 even when idle, and it seems to
be all due to this:

  $ ps ax | grep ' [DR] '
    869 ?        D      0:00 [kworker/24:1+usb_hub_wq]
   1900 ?        D      0:00 [kworker/24:7+pm]

where sometimes there are multiple of those 'pm' workers.

Doing a sysrq-w, I get this:

  task:kworker/24:3    state:D stack:0     pid:1308  tgid:1308  ppid:2
     flags:0x00000008
  Workqueue: pm pm_runtime_work
  Call trace:
   __switch_to+0xf4/0x168 (T)
   __schedule+0x248/0x648
   schedule+0x3c/0xe0
   usleep_range_state+0x118/0x150
   xhci_hub_control+0xe80/0x1090
   rh_call_control+0x274/0x7a0
   usb_hcd_submit_urb+0x13c/0x3a0
   usb_submit_urb+0x1c8/0x600
   usb_start_wait_urb+0x7c/0x180
   usb_control_msg+0xcc/0x150
   usb_port_suspend+0x414/0x510
   usb_generic_driver_suspend+0x68/0x90
   usb_suspend_both+0x1c8/0x290
   usb_runtime_suspend+0x3c/0xb0
   __rpm_callback+0x50/0x1f0
   rpm_callback+0x70/0x88
   rpm_suspend+0xe8/0x5a8
   __pm_runtime_suspend+0x4c/0x130
   usb_runtime_idle+0x48/0x68
   rpm_idle+0xa4/0x358
   pm_runtime_work+0xb0/0xe0

  task:kworker/24:7    state:D stack:0     pid:1900  tgid:1900  ppid:2
     flags:0x00000208
  Workqueue: pm pm_runtime_work
  Call trace:
   __switch_to+0xf4/0x168 (T)
   __schedule+0x248/0x648
   schedule+0x3c/0xe0
   usleep_range_state+0x118/0x150
   xhci_hub_control+0xe80/0x1090
   rh_call_control+0x274/0x7a0
   usb_hcd_submit_urb+0x13c/0x3a0
   usb_submit_urb+0x1c8/0x600
   usb_start_wait_urb+0x7c/0x180
   usb_control_msg+0xcc/0x150
   usb_port_suspend+0x414/0x510
   usb_generic_driver_suspend+0x68/0x90
   usb_suspend_both+0x1c8/0x290
   usb_runtime_suspend+0x3c/0xb0
   __rpm_callback+0x50/0x1f0
   rpm_callback+0x70/0x88
   rpm_suspend+0xe8/0x5a8
   __pm_runtime_suspend+0x4c/0x130

so it seems to be all in that xhci_hub_control() path. I'm not seeing
anything that has changed in the xhci driver in this merge window, so
maybe this goes back further, and I just haven't noticed this odd load
average issue before.

The call trace for the usb_hub_wq seems a lot less stable, but I've
seen backtraces like

  task:kworker/24:1    state:D stack:0     pid:869   tgid:869   ppid:2
     flags:0x00000008
  Workqueue: usb_hub_wq hub_event
  Call trace:
   __switch_to+0xf4/0x168 (T)
   __schedule+0x248/0x648
   schedule+0x3c/0xe0
   schedule_preempt_disabled+0x2c/0x50
   __mutex_lock.constprop.0+0x478/0x968
   __mutex_lock_slowpath+0x1c/0x38
   mutex_lock+0x6c/0x88
   hub_event+0x144/0x4a0
   process_one_work+0x170/0x408
   worker_thread+0x2cc/0x400
   kthread+0xf4/0x108
   ret_from_fork+0x10/0x20

But also just

  Workqueue: usb_hub_wq hub_event
  Call trace:
   __switch_to+0xf4/0x168 (T)
   usb_control_msg+0xcc/0x150

or

  Workqueue: usb_hub_wq hub_event
  Call trace:
   __switch_to+0xf4/0x168 (T)
   __schedule+0x248/0x648
   schedule+0x3c/0xe0
   schedule_timeout+0x94/0x120
   msleep+0x30/0x50

so at a guess it's just some interaction with that 'pm' workqueue.

I did a reboot just to verify that yes, it happened again after a
fresh boot. So it is at least *somewhat* consistently repeatable,
although I wouldn't be surprised if it's some kind of timing-dependent
race condition that just happens to trigger on this machine.

I could try to see if it's so consistent that I could bisect it, but
before I start doing that, maybe just the backtraces makes somebody go
"Ahh, that smells like XYZ"..

Hmm?

             Linus