Hi Thinh Sorry about the delay. On 10.4.2021 3.47, Thinh Nguyen wrote: > If there is a device with active enhanced super-speed (eSS) isoc IN > endpoint(s) behind one or more eSS hubs, DWC_usb31 (v1.90a and prior) > host controller will not detect the device disconnection until no more > isoc URB is submitted. If there's a device disconnection, internally > the wait for tHostTransactionTimeout (USB 3.2 spec 8.13) blocks the > other endpoints from being scheduled. So, it blocks the interrupt > endpoint of the eSS hub indicating the port change status. > > This can be an issue for applications that continuously submitting isoc > URBs to the xHCI. To work around this, stop processing new URBs after 3 > consecutive isoc transaction errors. If new isoc transfers are queued > after the device is disconnected, the host will respond with USB > transaction error. After 3 consecutive USB transaction errors, the > driver can wait a period of time (at least 2 * largest periodic interval > of the topology) without ringing isoc endpoint doorbell to detect the > port change status. If there is no disconnection detected, ring the > endpoint doorbell to resume isoc transfers. Is that enough? many Isoc URBs queue 16 - 32 Isoc TRBs per URB. And drivers like UVC queue several URBs in advance. If I remember correctly then a transaction errors won't stop Isoch endpoints, so waiting for 2 * Interval after 3 consecutive transaction errors might not be enough. How about stopping the endpoint after 3 consecutive transaction errors, and restating it a bit later? > > This workaround tracks the max eSS periodic interval every time there's > an endpoint added or dropped, which happens when there's bandwidth > check. So, scan the topology and update the xhci->max_ess_interval > whenever there's a bandwidth check. Introduced a new flag > VDEV_DISCONN_CHECK_PENDING to prevent ringing the doorbell while waiting > for a disconnection status. After 2 * max_ess_interval time and no > disconnection detected, a delayed work will ring the doorbell to resume > the active isoc transfers. Sounds very elaborate for a vendor specific disconnect workaround. Isn't there a simpler way? Maybe stop all isoc in endpoints if one them has 3 consecutive transaction error, wait for 2x hub interrupt interval time, and then restart the endpoints if there is no disconnect? There is bigger concern with this series, it scatters a lot of vendor specific code around the generic xhci driver. It's not very clear afterwards what code is part of the workaround and what is generic code. We just got a lot of the Mediatek code moved to xhci-mtk*, maybe its time to add xhci-snps.c instead of using the generic platform driver with tons of workarounds and quirks. Thanks -Mathias