Re: xHCI host dies on device unplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 21, 2022 at 02:12:03PM +0200, Mathias Nyman wrote:
> On 21.12.2022 12.11, Ladislav Michl wrote:
> > On Wed, Dec 21, 2022 at 11:58:42AM +0200, Mathias Nyman wrote:
> > > On 21.12.2022 9.14, Ladislav Michl wrote:
> > > > +Cc: Sneeker Yeh
> > > > 
> > > > On Mon, Dec 19, 2022 at 10:45:43PM +0100, Ladislav Michl wrote:
> > > > > On Mon, Dec 19, 2022 at 07:31:02PM +0100, Ladislav Michl wrote:
> > > > > > On Mon, Dec 19, 2022 at 02:25:46PM +0200, Mathias Nyman wrote:
> > > > > > > 
> > > > > > > Looks like controller didn't complete the stop endpoint command.
> > > > > > > 
> > > > > > > Event for last completed command (before cycle bit change "c" -> "C") was:
> > > > > > >     0x00000000028f55a0: TRB 00000000035e81a0 status 'Success' len 0 slot 1 ep 0 type 'Command Completion Event' flags e:c,
> > > > > > > 
> > > > > > > This was for command at 35e81a0, which in the command ring was:
> > > > > > >     0x00000000035e81a0: Reset Endpoint Command: ctx 0000000000000000 slot 1 ep 3 flags T:c
> > > > > > > 
> > > > > > > The stop endpoint command was the next command queued, at 35e81b0:
> > > > > > >     0x00000000035e81b0: Stop Ring Command: slot 1 sp 0 ep 3 flags c
> > > > > > > 
> > > > > > > There were a lot of URBs queued for this device, and they are cancelled one by one after disconnect.
> > > > > > > 
> > > > > > > Was this the only device connected? If so does connecting another usb device to another root port help?
> > > > > > > Just to test if the host for some reason partially stops a while after last device disconnect?
> > > > > > 
> > > > > > Device is connected directly into SoC. Once connected into HUB, host doesn't die
> > > > > > (as noted in other email, sorry for not replying to my own message, so it got lost)
> > > > > > It seems as intentional (power management?) optimization. If another device is
> > > > > > plugged in before 5 sec timeout expires, host completes stop endpoint command.
> > > > > > 
> > > > > > Unfortunately I cannot find anything describing this behavior in
> > > > > > documentation, so I'll ask manufacturer support.
> > > > > 
> > > > > As support is usually slow I asked search engine first and this sounds
> > > > > familiar:
> > > > > "Synopsis Designware USB3 IP earlier than v3.00a which is configured in silicon
> > > > > with DWC_USB3_SUSPEND_ON_DISCONNECT_EN=1, would need a specific quirk to prevent
> > > > > xhci host controller from dying when device is disconnected."
> > > > > 
> > > > > usb: dwc3: Add quirk for Synopsis device disconnection errata
> > > > > https://patchwork.kernel.org/project/linux-omap/patch/1424151697-2084-5-git-send-email-Sneeker.Yeh@xxxxxxxxxxxxxx/
> > > > > 
> > > > > Any clue what happened with that? I haven't found any meaningfull traces...
> > > > 
> > > > Lets step back a bit. All test so far was done with mainline 6.1.0 kernel.
> > > > I also tested Marvell's vendor tree, one based on 4.9.79, second on 5.4.30,
> > > > both heavily patched. The last version of above patch I found is v5:
> > > > https://lkml.org/lkml/2015/2/21/260
> > > > 
> > > 
> > > Looked at that same series and turned patch 1/5 into a standalone quick hack that applies on 6.1
> > > 
> > > Untested, does it work for you?
> > 
> > Applied on the top of you stop_endpoint_fixes, 6.1.0. is a base tree:
> > [   24.800835] xhci-hcd xhci-hcd.0.auto: Delay clearing port-1 CSC
> > [   24.806788] usb 1-1: USB disconnect, device number 2
> > [   28.148451] ieee80211 phy0: rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x101c with error -19
> > [   29.828466] xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command
> > [   29.856656] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
> > [   29.864804] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
> > [   29.949460] xhci-hcd xhci-hcd.0.auto: Late clearing port-1 CSC, portsc 0x202a0
> > 
> > What about checking whenever anything is still connected on command timeout
> > and considering device autosuspended instead of killing it?
> Agree that we shouldn't kill it now that we know about this case.
> 
> Any idea what happens to the unhandled commands that are queued.
> Are they lost, or does host continue processing them after reconnect?

Host continues processing them, see patch in another reply.
(it can be also verified on unpatched driver by connecting (even another)
device before command timeout expires - host normally continues operating)

> If stop endpoint command times out without any device connected we should
> probably start by manually giving back pending/cancelled URBs.
> 
> There will probably be a couple more commands queued after this when endpoints
> are dropped and usb device freed (disable xhci slot)
> Need to figure out what to do with these.
> 
> host still seems to respond to register writes even if it doesn't handle commands,
> so entering suspend should be easier to tackle.

At first I was trying to stop endpoint with explicid suspend, this way:
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 7a63bc56a195..7dadd2e4de8f 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1900,7 +1900,7 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 		}
 		ep->ep_state |= EP_STOP_CMD_PENDING;
 		xhci_queue_stop_endpoint(xhci, command, urb->dev->slot_id,
-					 ep_index, 0);
+					 ep_index, 1);
 		xhci_ring_cmd_db(xhci);
 	}
 done:

This at least allows command to finish, so host is not killed, but I haven't
figure how to resume operation.



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux