On 21.12.2022 12.11, Ladislav Michl wrote:
On Wed, Dec 21, 2022 at 11:58:42AM +0200, Mathias Nyman wrote:
On 21.12.2022 9.14, Ladislav Michl wrote:
+Cc: Sneeker Yeh
On Mon, Dec 19, 2022 at 10:45:43PM +0100, Ladislav Michl wrote:
On Mon, Dec 19, 2022 at 07:31:02PM +0100, Ladislav Michl wrote:
On Mon, Dec 19, 2022 at 02:25:46PM +0200, Mathias Nyman wrote:
Looks like controller didn't complete the stop endpoint command.
Event for last completed command (before cycle bit change "c" -> "C") was:
0x00000000028f55a0: TRB 00000000035e81a0 status 'Success' len 0 slot 1 ep 0 type 'Command Completion Event' flags e:c,
This was for command at 35e81a0, which in the command ring was:
0x00000000035e81a0: Reset Endpoint Command: ctx 0000000000000000 slot 1 ep 3 flags T:c
The stop endpoint command was the next command queued, at 35e81b0:
0x00000000035e81b0: Stop Ring Command: slot 1 sp 0 ep 3 flags c
There were a lot of URBs queued for this device, and they are cancelled one by one after disconnect.
Was this the only device connected? If so does connecting another usb device to another root port help?
Just to test if the host for some reason partially stops a while after last device disconnect?
Device is connected directly into SoC. Once connected into HUB, host doesn't die
(as noted in other email, sorry for not replying to my own message, so it got lost)
It seems as intentional (power management?) optimization. If another device is
plugged in before 5 sec timeout expires, host completes stop endpoint command.
Unfortunately I cannot find anything describing this behavior in
documentation, so I'll ask manufacturer support.
As support is usually slow I asked search engine first and this sounds
familiar:
"Synopsis Designware USB3 IP earlier than v3.00a which is configured in silicon
with DWC_USB3_SUSPEND_ON_DISCONNECT_EN=1, would need a specific quirk to prevent
xhci host controller from dying when device is disconnected."
usb: dwc3: Add quirk for Synopsis device disconnection errata
https://patchwork.kernel.org/project/linux-omap/patch/1424151697-2084-5-git-send-email-Sneeker.Yeh@xxxxxxxxxxxxxx/
Any clue what happened with that? I haven't found any meaningfull traces...
Lets step back a bit. All test so far was done with mainline 6.1.0 kernel.
I also tested Marvell's vendor tree, one based on 4.9.79, second on 5.4.30,
both heavily patched. The last version of above patch I found is v5:
https://lkml.org/lkml/2015/2/21/260
Looked at that same series and turned patch 1/5 into a standalone quick hack that applies on 6.1
Untested, does it work for you?
Applied on the top of you stop_endpoint_fixes, 6.1.0. is a base tree:
[ 24.800835] xhci-hcd xhci-hcd.0.auto: Delay clearing port-1 CSC
[ 24.806788] usb 1-1: USB disconnect, device number 2
[ 28.148451] ieee80211 phy0: rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x101c with error -19
[ 29.828466] xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command
[ 29.856656] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
[ 29.864804] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
[ 29.949460] xhci-hcd xhci-hcd.0.auto: Late clearing port-1 CSC, portsc 0x202a0
What about checking whenever anything is still connected on command timeout
and considering device autosuspended instead of killing it?
Agree that we shouldn't kill it now that we know about this case.
Any idea what happens to the unhandled commands that are queued.
Are they lost, or does host continue processing them after reconnect?
If stop endpoint command times out without any device connected we should
probably start by manually giving back pending/cancelled URBs.
There will probably be a couple more commands queued after this when endpoints
are dropped and usb device freed (disable xhci slot)
Need to figure out what to do with these.
host still seems to respond to register writes even if it doesn't handle commands,
so entering suspend should be easier to tackle.
-Mathias
ladis