Mathias Nyman wrote: > On 28.4.2021 1.30, Thinh Nguyen wrote: >>> On 10.4.2021 3.47, Thinh Nguyen wrote: >>>> If there is a device with active enhanced super-speed (eSS) isoc IN >>>> endpoint(s) behind one or more eSS hubs, DWC_usb31 (v1.90a and prior) >>>> host controller will not detect the device disconnection until no more >>>> isoc URB is submitted. If there's a device disconnection, internally >>>> the wait for tHostTransactionTimeout (USB 3.2 spec 8.13) blocks the >>>> other endpoints from being scheduled. So, it blocks the interrupt >>>> endpoint of the eSS hub indicating the port change status. >>>> >>>> This can be an issue for applications that continuously submitting isoc >>>> URBs to the xHCI. To work around this, stop processing new URBs after 3 >>>> consecutive isoc transaction errors. If new isoc transfers are queued >>>> after the device is disconnected, the host will respond with USB >>>> transaction error. After 3 consecutive USB transaction errors, the >>>> driver can wait a period of time (at least 2 * largest periodic interval >>>> of the topology) without ringing isoc endpoint doorbell to detect the >>>> port change status. If there is no disconnection detected, ring the >>>> endpoint doorbell to resume isoc transfers. >>> >>> Is that enough? many Isoc URBs queue 16 - 32 Isoc TRBs per URB. >>> And drivers like UVC queue several URBs in advance. >> >> That's fine as long as the driver stops ringing more doorbell for a >> certain period of time creating a gap that's enough to get the >> notification from the interrupt endpoint. We tested with 128 isoc URBs >> and was able to detect a disconnect after this delay. > > Ok, if not ringing the endpoint is enough then that is better than > stopping the whole endpoint. > >>>> This workaround tracks the max eSS periodic interval every time there's >>>> an endpoint added or dropped, which happens when there's bandwidth >>>> check. So, scan the topology and update the xhci->max_ess_interval >>>> whenever there's a bandwidth check. Introduced a new flag >>>> VDEV_DISCONN_CHECK_PENDING to prevent ringing the doorbell while waiting >>>> for a disconnection status. After 2 * max_ess_interval time and no >>>> disconnection detected, a delayed work will ring the doorbell to resume >>>> the active isoc transfers. >>> >>> Sounds very elaborate for a vendor specific disconnect workaround. >>> Isn't there a simpler way? >>> >>> Maybe stop all isoc in endpoints if one them has 3 consecutive transaction error, >>> wait for 2x hub interrupt interval time, and then restart the endpoints if there is >>> no disconnect? >> >> We can also do this (but without stop + restart the endpoints). It just >> creates a slightly larger gap that may be more noticeable to the user if >> there's no actual disconnection. > > Ok, if blocking the doorbell is enough then it sounds better. > > How about that max interval tracking, is it necessary? > It will block the doorbell from 250us up to several seconds. > Is there some reasonable single value that can be used instead? Yeah, I agree with you. I think we can find a number that satisfies with most applications while keeping this change small and less intrusive. > >>> >>> There is bigger concern with this series, it scatters a lot of vendor specific code >>> around the generic xhci driver. It's not very clear afterwards what code is part of the >>> workaround and what is generic code. >>> >>> We just got a lot of the Mediatek code moved to xhci-mtk*, maybe its time to add xhci-snps.c >>> instead of using the generic platform driver with tons of workarounds and quirks. >>> >> >> Thanks for the reviews. I need to look into how this can be done. May >> need your suggestion as not every scenarios can be overridden >> easily/cleanly. > > true, no easy overrides for this transfer error / doorbell blocking workaround. > Needs a bit of work > Thanks, Thinh