On Fri, 28 Feb 2025 18:18:24 +0200, Mathias Nyman wrote: > Unplugging a USB3.0 webcam from Etron hosts while streaming results > in errors like this: > > [ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr > not part of current TD ep_index 18 comp_code 13 [ 2.646446] xhci_hcd > 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start > 000000002fdf8640 trb-end 000000002fdf8650 [ 2.646560] xhci_hcd > 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD > ep_index 18 comp_code 13 [ 2.646568] xhci_hcd 0000:03:00.0: Looking > for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end > 000000002fdf8670 > > Etron xHC generates two transfer events for the TRB if an error is > detected while processing the last TRB of an isoc TD. > > The first event can be any sort of error (like USB Transaction or > Babble Detected, etc), and the final event is Success. > > The xHCI driver will handle the TD after the first event and remove it > from its internal list, and then print an "Transfer event TRB DMA ptr > not part of current TD" error message after the final event. > > Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a > transaction error mid TD.") is designed to address isoc transaction > errors, but unfortunately it doesn't account for this scenario. > > This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a > success event follows a 'short transfer' event, but the TD the event > points to is already given back. > > Expand the spurious success 'short transfer' event handling to cover > the spurious success after error on Etron hosts. > > Kuangyi Chiang reported this issue and submitted a different solution > based on using error_mid_td. This commit message is mostly taken > from that patch. > > Reported-by: Kuangyi Chiang <ki.chiang65@xxxxxxxxx> > Signed-off-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> Works here too, modulo the obvious build problem. Etron with errors: [ 1064.865311] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint [ 1064.865322] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td comp code 4 [ 1064.865326] xhci_hcd 0000:06:00.0: Spurious event dma 0x00000000ffef88c0, comp_code 13 after 4 Renesas with short packets: [ 1365.299218] xhci_hcd 0000:08:00.0: Successful completion on short TX for slot 1 ep 2 with last td comp code 13 [ 1365.299223] xhci_hcd 0000:08:00.0: Spurious event dma 0x00000000ffbbf870, comp_code 13 after 13 BTW, it says "comp_code 13 after something" because of this crazy TRUST_TX_LENGTH hack, which changes trb_comp_code if it's success but the residual is nonzero. If I remove the hack, Etron: [ 2713.630443] xhci_hcd 0000:06:00.0: Spurious event dma 0x00000000ff3b6130, comp_code 1 after 4 Renesas: [ 4033.652300] xhci_hcd 0000:08:00.0: Spurious event dma 0x00000000ffcd1b80, comp_code 1 after 13 The hack could almost be removed now, but if there really are HCs which report Success on the first event, this won't work for them: > +static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci, > + struct xhci_ring *ring) > +{ > + switch (ring->old_trb_comp_code) { > + case COMP_SHORT_PACKET: > + return xhci->quirks & XHCI_SPURIOUS_SUCCESS; Could it work without relying on fictional COMP_SHORT_PACKET events? > + if (xhci_spurious_success_tx_event(xhci, ep_ring)) { > + xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n", > + &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code); > + ep_ring->old_trb_comp_code = trb_comp_code; This part will (quite arbitrarily IMO) not execute if td_list is empty. I had this idea that "empty td_list" and "no matching TD on td_list" are practically identical cases, and their code could be merged.