On 21.5.2020 6.45, Rik van Riel wrote: > On Wed, 2020-05-20 at 16:34 -0400, Alan Stern wrote: >> On Wed, May 20, 2020 at 03:21:44PM -0400, Rik van Riel wrote: >>> >>> Interesting. That makes me really curious why things are >>> getting stuck, now... >> >> This could be a bug in xhci-hcd. Perhaps the controller's endpoint >> state needs to be updated after one of these errors occurs. Mathias >> will know all about that. > > I am seeing something potentially interesting in the > giant trace. First the final enqueue/dequeue before > the babble error: > > <idle>-0 [005] d.s. 776367.638233: xhci_inc_enq: ISOC > 0000000033a6879e: enq 0x0000001014070420(0x0000001014070000) deq > 0x0000001014070360(0x0000001014070000) segs 2 stream 0 free_trbs 497 > bounce 196 cycle 1 > > The next reference to 0x0000001014070360 is the babble error, > and some info on the ISOC buffer itself: > > <idle>-0 [005] d.h. 776367.639187: xhci_handle_event: > EVENT: TRB 0000001014070360 status 'Babble Detected' len 196 slot 15 ep > 9 type 'Transfer Event' flags e:C > <idle>-0 [005] d.h. 776367.639195: xhci_handle_transfer: > ISOC: Buffer 0000000e2676f400 length 196 TD size 0 intr 0 type 'Isoch' > flags b:i:I:c:s:I:e:C >n > Immediately after the babble error, the next request is enqueued, > and the doorbell is rung: > > <idle>-0 [005] d.h. 776367.639196: xhci_inc_deq: ISOC 0000000033a6879e: enq 0x0000001014070420(0x0000001014070000) deq 0x0000001014070370(0x0000001014070000) segs 2 stream 0 free_trbs 498 bounce 196 cycle 1 > <idle>-0 [005] d.h. 776367.639197: xhci_urb_giveback: ep4in-isoc: urb 0000000072126553 pipe 135040 slot 15 length 196/196 sgs 0/0 stream 0 flags 00000206 > <idle>-0 [005] d.h. 776367.639197: xhci_inc_deq: EVENT 0000000097f84b16: enq 0x00000010170b5000(0x00000010170b5000) deq 0x00000010170b5670(0x00000010170b5000) segs 1 stream 0 free_trbs 254 bounce 0 cycle 1 > <idle>-0 [005] ..s. 776367.639212: xhci_urb_enqueue: ep4in-isoc: urb 0000000072126553 pipe 135040 slot 15 length 0/196 sgs 0/0 stream 0 flags 00000206 > <idle>-0 [005] d.s. 776367.639214: xhci_queue_trb: ISOC: Buffer 0000000e2676f400 length 196 TD size 0 intr 0 type 'Isoch' flags b:i:I:c:s:I:e:c > <idle>-0 [005] d.s. 776367.639214: xhci_inc_enq: ISOC 0000000033a6879e: enq 0x0000001014070430(0x0000001014070000) deq 0x0000001014070370(0x0000001014070000) segs 2 stream 0 free_trbs 497 bounce 196 cycle 1 > <idle>-0 [005] d.s. 776367.639215: xhci_ring_ep_doorbell: Ring doorbell for Slot 15 ep4in > > However, after that point, no more xhci_handle_transfer: ISOC > lines ar seen in the log. The doorbell line above is the last > line in the log for ep4in. > > Is this some area where USB3 and USB2 behave differently? It acts as if the endpoint is no longer running. If the endpoint would be halted then xhci_requires_manual_halt_cleanup() should reset the endpoints and it would show in the traces. Could you add the code below and take new traces, it will show the endpoint state after the Babble error. diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 0fda0c0f4d31..373d89ef7275 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2455,6 +2455,7 @@ static int handle_tx_event(struct xhci_hcd *xhci, case COMP_BABBLE_DETECTED_ERROR: xhci_dbg(xhci, "Babble error for slot %u ep %u on endpoint\n", slot_id, ep_index); + trace_xhci_handle_tx_event(ep_ctx); status = -EOVERFLOW; break; /* Completion codes for endpoint error state */ diff --git a/drivers/usb/host/xhci-trace.h b/drivers/usb/host/xhci-trace.h index b19582b2a72c..5081df079f4a 100644 --- a/drivers/usb/host/xhci-trace.h +++ b/drivers/usb/host/xhci-trace.h @@ -360,6 +360,11 @@ DEFINE_EVENT(xhci_log_ep_ctx, xhci_add_endpoint, TP_ARGS(ctx) ); +DEFINE_EVENT(xhci_log_ep_ctx, xhci_handle_tx_event, + TP_PROTO(struct xhci_ep_ctx *ctx), + TP_ARGS(ctx) +); + DECLARE_EVENT_CLASS(xhci_log_slot_ctx, TP_PROTO(struct xhci_slot_ctx *ctx), TP_ARGS(ctx),