Re: [PATCH v2 3/7] usb: xhci: Check for blocked disconnection

Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> · Tue, 27 Apr 2021 22:30:43 +0000

Mathias Nyman wrote:
> Hi Thinh
> 
> Sorry about the delay. 

Np :)

> 
> On 10.4.2021 3.47, Thinh Nguyen wrote:
>> If there is a device with active enhanced super-speed (eSS) isoc IN
>> endpoint(s) behind one or more eSS hubs, DWC_usb31 (v1.90a and prior)
>> host controller will not detect the device disconnection until no more
>> isoc URB is submitted. If there's a device disconnection, internally
>> the wait for tHostTransactionTimeout (USB 3.2 spec 8.13) blocks the
>> other endpoints from being scheduled. So, it blocks the interrupt
>> endpoint of the eSS hub indicating the port change status.
>>
>> This can be an issue for applications that continuously submitting isoc
>> URBs to the xHCI. To work around this, stop processing new URBs after 3
>> consecutive isoc transaction errors. If new isoc transfers are queued
>> after the device is disconnected, the host will respond with USB
>> transaction error. After 3 consecutive USB transaction errors, the
>> driver can wait a period of time (at least 2 * largest periodic interval
>> of the topology) without ringing isoc endpoint doorbell to detect the
>> port change status. If there is no disconnection detected, ring the
>> endpoint doorbell to resume isoc transfers.
> 
> Is that enough? many Isoc URBs queue 16 - 32 Isoc TRBs per URB.
> And drivers like UVC queue several URBs in advance.

That's fine as long as the driver stops ringing more doorbell for a
certain period of time creating a gap that's enough to get the
notification from the interrupt endpoint. We tested with 128 isoc URBs
and was able to detect a disconnect after this delay.

> 
> If I remember correctly then a transaction errors won't stop Isoch endpoints,
> so waiting for 2 * Interval after 3 consecutive transaction errors might not
> be enough.
> 
> How about stopping the endpoint after 3 consecutive transaction errors,
> and restating it a bit later?

There's no need to stop and restart the endpoint.

> 
>>
>> This workaround tracks the max eSS periodic interval every time there's
>> an endpoint added or dropped, which happens when there's bandwidth
>> check. So, scan the topology and update the xhci->max_ess_interval
>> whenever there's a bandwidth check. Introduced a new flag
>> VDEV_DISCONN_CHECK_PENDING to prevent ringing the doorbell while waiting
>> for a disconnection status. After 2 * max_ess_interval time and no
>> disconnection detected, a delayed work will ring the doorbell to resume
>> the active isoc transfers.
> 
> Sounds very elaborate for a vendor specific disconnect workaround.
> Isn't there a simpler way?
> 
> Maybe stop all isoc in endpoints if one them has 3 consecutive transaction error,
> wait for 2x hub interrupt interval time, and then restart the endpoints if there is
> no disconnect?

We can also do this (but without stop + restart the endpoints). It just
creates a slightly larger gap that may be more noticeable to the user if
there's no actual disconnection.

> 
> There is bigger concern with this series, it scatters a lot of vendor specific code 
> around the generic xhci driver. It's not very clear afterwards what code is part of the
> workaround and what is generic code.
> 
> We just got a lot of the Mediatek code moved to xhci-mtk*, maybe its time to add xhci-snps.c
> instead of using the generic platform driver with tons of workarounds and quirks.
> 

Thanks for the reviews. I need to look into how this can be done. May
need your suggestion as not every scenarios can be overridden
easily/cleanly.

What about the other quirks, do you have any comments?

Thanks,
Thinh