Re: [RFT PATCH] xhci: process isoc TD properly when there was an error mid TD.

Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> · Wed, 17 Jan 2024 12:46:00 +0200

On 17.1.2024 0.20, Michał Pecio wrote:
I applied your patch on v6.7 and it appears to be working. It removes
the disconnection spam and also handles intermittent transmission errors
on UVC without obvious glitches or errors messages, except one xhci_dbg
added to confirm that I'm really hitting this edge case.

Anything else that might be worth testing?

I have a question, though. What happens if there is no next TD because
a mid TD error has occured on the last packet queued by the client? Is
there any mechanism to retire that stuck TD on a NEC host which submits
one mid TD error event and then goes silent?

In disconnect cases usb core should flush the remaining URBs once
roothub code notices the disconnect.

But yes, if the last TD in a URB is a multi TRB isoc TD, and it has an error
MID TD then its stuck until timeout.

Would it be possible to retire the TD right after the first failed TRB?
(I imagine difficulties in determining when exactly the host has moved
its internal pointer past the remaining TRBs so they can be reused).

Probably not as a normal error handling routine.
We have the same "Transfer event TRB DMA ptr not part of current TD" issue
for hosts that do issue an event for the last TRB.

If the TD is given back immediately we also have a memory issue as the
DMA address pointed to by that last TRB might be accessed by the controller
_after_ driver gave back the TD, and possibly freed/unmapped it.

But for that special case where there are no more TDs queued it might
make sense

-			if (!ep->skip ||
-			    !usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
+			if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
I like this. I would suggest another cleanup: the if(!ep_seg && stuf)
right above your change could be pulled inside if(!ep_seg).

Noticed the same, but for stable kernel reasons it's probably better to limit
this patch to mostly fixing this bug.

+			 * if there was an error event mid TD then host may not
+			 * give an event for the last TRB on an isoc TD.
+			 * This event can be for the next TD, See xHCI 4.9.1.
This seems to suggest that 4.9.1 encourages such behavior, but the
opposite is the case as far as I understand.

I'll rephrase this.

+			if (td->error_mid_td) {
+				struct xhci_td *td_next = list_next_entry(td, td_list);
This if needs && !list_is_last(&td->td_list, &ep_ring->td_list).

Thanks, nice catch, good point.

Otherwise a serious bug in the host (maybe in the driver too) tricks
us into grabbing a pointer to ep_ring instead, filling the subsequent
"TRB not part of current TD" message with mystifying garbage numbers.

+				if (ep_seg) {
+					/* give back previous TD, start handling new */
Suggested:
+					xhci_dbg(xhci, "Missing completion event after mid TD error\n");

Makes sense.

Thanks
Mathias