[Bug 219824] [6.13 regression] USB controller just died

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #22 from Artem S. Tashkinov (aros@xxxxxxx) ---
6.13.7 absolutely includes it:

https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.13.7

> commit 80cb8e694110dee4ac6fbf0956ba7439aeb0603d
> Author: Michal Pecio <michal.pecio@xxxxxxxxx>
> Date:   Tue Mar 4 13:31:47 2025 +0200
> 
>     usb: xhci: Fix host controllers "dying" after suspend and resume
>     
>     commit c7c1f3b05c67173f462d73d301d572b3f9e57e3b upstream.
>     
>     A recent cleanup went a bit too far and dropped clearing the cycle bit
>     of link TRBs, so it stays different from the rest of the ring half of
>     the time. Then a race occurs: if the xHC reaches such link TRB before
>     more commands are queued, the link's cycle bit unintentionally matches
>     the xHC's cycle so it follows the link and waits for further commands.
>     If more commands are queued before the xHC gets there, inc_enq() flips
>     the bit so the xHC later sees a mismatch and stops executing commands.
>     
>     This function is called before suspend and 50% of times after resuming
>     the xHC is doomed to get stuck sooner or later. Then some Stop Endpoint
>     command fails to complete in 5 seconds and this shows up
>     
>     xhci_hcd 0000:00:10.0: xHCI host not responding to stop endpoint command
>     xhci_hcd 0000:00:10.0: xHCI host controller not responding, assume dead
>     xhci_hcd 0000:00:10.0: HC died; cleaning up
>     
>     followed by loss of all USB decives on the affected bus. That's if you
>     are lucky, because if Set Deq gets stuck instead, the failure is silent.
>     
>     Likely responsible for kernel bug 219824. I found this while searching
>     for possible causes of that regression and reproduced it locally before
>     hearing back from the reporter. To repro, simply wait for link cycle to
>     become set (debugfs), then suspend, resume and wait. To accelerate the
>     failure I used a script which repeatedly starts and stops a UVC camera.
>     
>     Some HCs get fully reinitialized on resume and they are not affected.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux