Mathias Nyman wrote: > Hi Thinh > > Sorry about the delay. Np :) > > On 10.4.2021 3.47, Thinh Nguyen wrote: >> If there is a device with active enhanced super-speed (eSS) isoc IN >> endpoint(s) behind one or more eSS hubs, DWC_usb31 (v1.90a and prior) >> host controller will not detect the device disconnection until no more >> isoc URB is submitted. If there's a device disconnection, internally >> the wait for tHostTransactionTimeout (USB 3.2 spec 8.13) blocks the >> other endpoints from being scheduled. So, it blocks the interrupt >> endpoint of the eSS hub indicating the port change status. >> >> This can be an issue for applications that continuously submitting isoc >> URBs to the xHCI. To work around this, stop processing new URBs after 3 >> consecutive isoc transaction errors. If new isoc transfers are queued >> after the device is disconnected, the host will respond with USB >> transaction error. After 3 consecutive USB transaction errors, the >> driver can wait a period of time (at least 2 * largest periodic interval >> of the topology) without ringing isoc endpoint doorbell to detect the >> port change status. If there is no disconnection detected, ring the >> endpoint doorbell to resume isoc transfers. > > Is that enough? many Isoc URBs queue 16 - 32 Isoc TRBs per URB. > And drivers like UVC queue several URBs in advance. That's fine as long as the driver stops ringing more doorbell for a certain period of time creating a gap that's enough to get the notification from the interrupt endpoint. We tested with 128 isoc URBs and was able to detect a disconnect after this delay. > > If I remember correctly then a transaction errors won't stop Isoch endpoints, > so waiting for 2 * Interval after 3 consecutive transaction errors might not > be enough. > > How about stopping the endpoint after 3 consecutive transaction errors, > and restating it a bit later? There's no need to stop and restart the endpoint. > >> >> This workaround tracks the max eSS periodic interval every time there's >> an endpoint added or dropped, which happens when there's bandwidth >> check. So, scan the topology and update the xhci->max_ess_interval >> whenever there's a bandwidth check. Introduced a new flag >> VDEV_DISCONN_CHECK_PENDING to prevent ringing the doorbell while waiting >> for a disconnection status. After 2 * max_ess_interval time and no >> disconnection detected, a delayed work will ring the doorbell to resume >> the active isoc transfers. > > Sounds very elaborate for a vendor specific disconnect workaround. > Isn't there a simpler way? > > Maybe stop all isoc in endpoints if one them has 3 consecutive transaction error, > wait for 2x hub interrupt interval time, and then restart the endpoints if there is > no disconnect? We can also do this (but without stop + restart the endpoints). It just creates a slightly larger gap that may be more noticeable to the user if there's no actual disconnection. > > There is bigger concern with this series, it scatters a lot of vendor specific code > around the generic xhci driver. It's not very clear afterwards what code is part of the > workaround and what is generic code. > > We just got a lot of the Mediatek code moved to xhci-mtk*, maybe its time to add xhci-snps.c > instead of using the generic platform driver with tons of workarounds and quirks. > Thanks for the reviews. I need to look into how this can be done. May need your suggestion as not every scenarios can be overridden easily/cleanly. What about the other quirks, do you have any comments? Thanks, Thinh