Re: [PATCH v4 1/1] xhci: Correctly handle last TRB of isoc TD on Etron xHCI host

Michał Pecio <michal.pecio@xxxxxxxxx> · Mon, 10 Feb 2025 09:57:36 +0100

On Fri, 7 Feb 2025 14:06:54 +0200, Mathias Nyman wrote:
> On 6.2.2025 0.42, Michał Pecio wrote:
> > error_mid_td can cope with hosts which don't produce the extra
> > success event, it was done this way to deal with buggy NECs. The
> > cost is one more ESIT of latency on TDs with error.  
> 
> It makes giving back the TD depend on a future event we can't
> guarantee.

For the record, this is not the same disaster as failing to give back
an unlinked URB. Situation here is no different from usual 'error mid
TD' case on buggy HCs (known so far: NEC uPD720200 and most if not all
of ASMedia, including at least 1st gen AMD Promontory).

We are owed an event in the next ESIT, worst case it will be something
weird like Missed Service or Ring Underrun. I've sent you some patches
for that, they also apply to the existing NEC/ASMedia problem.

> I still think it better fits the spurious success case.
> It's not an error mid TD, it's a spurious success event sent by host
> after a completion (error) event for the last TRB in the TD.

Legally you are right of course, but materially we know what happens:
the damn thing still holds an internal reference to the last TRB for
some unknown reason. We would need to know that it doesn't actually
use it for anything and will not mind the TRB being overwritten.

This may well be true, but I guess I prefer known evils over unknown
ones so I immediately suggested using 'erorr mid TD' instead.

> Making this change to error_mid_td code also makes that code more
> confusing and harder to follow.

I will see if I can come up with something clean.

I will also try the patch you sent, it looks like it would work.

One thing I don't like is that it fails to distinguish the known
spurious events from other invalid events due to hardware or kernel
bugs. In the past last_td_was_short caused me similar problems.

Regards,
Michal