On Mon, Apr 17, 2023, Avichal Rakesh wrote: > On Mon, Apr 17, 2023 at 7:49 PM Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> wrote: > > > > On Mon, Apr 17, 2023, Avichal Rakesh wrote: > > > > > > On 4/17/23 06:49, Greg KH wrote: > > > > On Fri, Apr 14, 2023 at 02:03:02PM -0700, Avichal Rakesh wrote: > > > > > > > >> This problem may be further exaggerated by the DWC3 controller driver > > > >> (which is what my device has) not setting the IMI flag when > > > >> no_interrupt flag is set > > > >> (https://urldefense.com/v3/__https://lore.kernel.org/all/ced336c84434571340c07994e3667a0ee284fefe.1666735451.git.Thinh.Nguyen@xxxxxxxxxxxx/)?__;!!A4F2R9G_pg!fB0krUX6qsOXrBsOHzLstVFWqTLL9ncwGFXxlxK0eedLY_29XdwjXRtbO-EWv1eX5okN1rOeBZspZ21KSb5b$ > > > >> UVC Gadget Driver sets the no_interrupt flag for ~3/4 of its queued > > > >> usb_request, so an ISOC failure may not immediately interrupt the UVC > > > >> gadget driver, leaving more time for the frame to finish encoding. > > > >> > > > >> I couldn't find any concrete error handling rules in the UVC specs, so > > > >> I am not sure what the proper solution here is. To try out, I created > > > >> a patch (attached below) that dequeues all queued usb_requests from > > > >> the endpoint in case of an ISOC failure and clears the uvc buffer > > > >> queue. This eliminated the partial frames with no perceivable frame > > > >> drops. > > > >> > > > >> So my questions here are: > > > >> 1. Is this a known issue, and if so are there workarounds for it? > > > >> 2. If the answer to above is "No", does the explanation and mitigation > > > >> seem reasonable? > > > >> > > > >> Patch follows (mostly for illustration, I can formalize it if > > > >> needed!). It adds a new 'req_inflight' list to track queued > > > >> usb_requests that have not been given back to the gadget driver and > > > >> drops all the queued requests in case of an ISOC failure. The other > > > >> changes are for the extra bookkeeping required to handle dropping all > > > >> frames. I haven't been able to confirm it, but as far as I can tell > > > >> the issue exists at ToT as well. > > > >> > > > > Perhaps this conversation with Jeff may explain the issues you observed: > > https://urldefense.com/v3/__https://lore.kernel.org/all/20221021164349.fft4yqnxuztsqdeu@xxxxxxxxxxxx/__;!!A4F2R9G_pg!eK3VyAq7vX70vA-WJLA6_bPMbZFx0n33wH39JItHzwCNVqKSkN2aG0izF0GZPWqYxkgL-fNinWlIW71HbGGC$ > > > > To summarized the long conversation, the UVC needs to maintain a > > constant queue of usb requests. Each request is scheduled for a specific > > interval. If the UVC couldn't keep up feeding requests to the > > controller, then we will have stale requests and missed isoc. > > > > From the discussion and review, the UVC couldn't queue requests fast > > enough. The problem is exacerbated when you reduce the interrupt > > frequency with no_interrupt setting. The dwc3 driver has some basic > > mechanism to detect for underrun by checking if the queue is empty, but > > that's not enough to workaround it. > > > > Your suggestion to dequeue the request would mean the controller driver > > will reschedule the isoc transfer again, which reschedule the next > > request for a new interval (potentially avoid being stale). That's fine, > > but that may not be a great solution because we're still playing catch > > up and the missed isoc already happened. > > > > You may try to do the followings to mitigate this issue: > > 1) Make sure UVC to only queuing rate is constant. Perhaps queue dummy > > requests in between time gaps if need be? > > > This is extremely helpful, thank you! I have a somewhat basic > question: For an isoc request, is its "freshness" determined from the > time at which usb_ep_queue is called, or from the time at which the > previous request was transferred to the host? If the "freshness" is > determined from the time 'usb_ep_request' was called, I wonder if the > "underrun" is artificial because UVC Gadget driver pumps all free > usb_requests at once, and then waits around doing nothing while the > usb_requests get stale in the controller's queue. In this case, just > slowing the encode loop to wait a little before queuing more requests > should do effectively what you suggest above? > Here's a simplified version of how it works: (Note that I'll be talking/using usb request as if it's TRBs and also ignore SG. I will also only focus about the current handling of the gadget driver and dwc3 driver.) The controller tracks the current uframe. For isoc endpoint, when you call usb_ep_queue(), the controller doesn't consume the request right away. The request is scheduled for a specific interval. For UVC, an interval is a single uframe (125us). As the current uframe keeps incrementing, each request on a queue is consumed. If there's no request available for the current uframe, the next queued request is considered stale or expired. This case is considered underrun. "freshness" means the request is scheduled for a future uframe. "stale" means the request is queued for a past uframe. Isoc data is time sensitive. So if the data isn't gone out at a specific time, then it's dropped. The problem with many gadget drivers that use isoc endpoint is that they only queue more requests when they get notified that the previous set of requests are completed. This creates time gaps which can starve the controller of more requests. We have some basic mechanism in dwc3 to reschedule new requests when there's missed isoc and the queue is empty for UVC. However that's not enough. The time the controller consumes the request and the time the driver handles the interrupt is different. The driver may not know that queue is empty until it's already too late. The gadget driver just needs to make sure to keep the request queue to at least X amount of requests. Note that 125us is relatively fast. Note that this talking point is assuming that the UVC host behaving properly and poll for data every uframe. If not, it's a different issue. > > 2) Enhance dwc3 to detect underrun from UVC on request queue. ie. If the > > queue is empty and a new request is queue, reschedule the isoc transfer. > > This will break some other devices if time guarantee is required. So > > perhaps an additional flag is needed for this change of behavior. > > > > I am not as familiar with the DWC3, but just to clarify: Is your > suggestion to reschedule the stale requests if the controller's send > queue is empty (underrun) and the UVC gadget driver queues a new > request? Depending on how many stale requests there are, would that > run the risk of timing out the new request as well? > No, rescheduling here means all the queued requests will be scheduled for a new future uframe. But also note that the queue empty check is insufficient as explained above. So this would only mitigate the issue. BR, Thinh