On 22.1.2024 11.03, Michał Pecio wrote:
Apparently a babble error, and it seems to have generated a "success" which the event handler tried to match with the next TD. So a mid TD babble may need the same treatment, which is not surprising.
Makes sense.
This is now confirmed and fixed here. The change is obvious enough: case COMP_ISOCH_BUFFER_OVERRUN: case COMP_BABBLE_DETECTED_ERROR: + error_mid_td = true; frame->status = -EOVERFLOW; break; I don't know yet what COMP_ISOCH_BUFFER_OVERRUN means, but I guess it's the same story. BTW, error_mid_td is a local variable now and I use the urb_length_set flag instead, as explained before.
To me it looks like COMP_BABBLE_DETECTED_ERROR and COMP_ISOCH_BUFFER_OVERRUN have the same cause, device is sending too much data. Isoc endpoints should use COMP_ISOCH_BUFFER_OVERRUN to indicate endpoint hasn't halted like it does in the COMP_BABBLE_DETECTED_ERROR case. See xhci 6.4.5 "TRB completion codes" Footnote 115
I found that it can be reproduced on the VIA host, with enough tries it can happen even on a chained TD. NEC doesn't signal these babble errors but new mid TD event handling should cope with either host.
So looks like VIA host incorrectly sends babble for Isoc endpoints
Debug trace ("interesting" is other than "success" or "short packet"): [ 4113.376349] xhci_hcd 0000:03:00.0: handle_tx_event interesting ep_trb_dma 132961000 comp_code 3 slot 2 ep 2 [ 4113.376361] xhci_hcd 0000:03:00.0: handle_tx_event first_trb 132961000 last_trb 132961010 [ 4113.376364] xhci_hcd 0000:03:00.0: Error mid isoc TD, wait for final completion event [ 4113.376366] xhci_hcd 0000:03:00.0: handle_tx_event uninteresting ep_trb_dma 132961010 comp_code 1 slot 2 ep 2 [ 4113.376369] xhci_hcd 0000:03:00.0: handle_tx_event first_trb 132961000 last_trb 132961010 [ 4113.376371] xhci_hcd 0000:03:00.0: Got SUCCESS after mid TD error [ 4113.376373] xhci_hcd 0000:03:00.0: finish_td comp_code 1 status -75
I'm afraid we end up tuning that original patch forever with these new findings, so lets make this into a new patch on top of the previous one. That one is tested and known to work in the transaction error case. Let me know if you want to write it, otherwise I will Thanks Mathias