On Tue, Apr 18, 2023, Avichal Rakesh wrote: > On Tue, Apr 18, 2023 at 12:40 PM Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> wrote: > > > > On Mon, Apr 17, 2023, Avichal Rakesh wrote: > > > On Mon, Apr 17, 2023 at 7:49 PM Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> wrote: > > > > > > > > On Mon, Apr 17, 2023, Avichal Rakesh wrote: > > > > > > > > > > On 4/17/23 06:49, Greg KH wrote: > > > > > > On Fri, Apr 14, 2023 at 02:03:02PM -0700, Avichal Rakesh wrote: > > > > > > > > > > > >> This problem may be further exaggerated by the DWC3 controller driver > > > > > >> (which is what my device has) not setting the IMI flag when > > > > > >> no_interrupt flag is set > > > > > >> (https://urldefense.com/v3/__https://lore.kernel.org/all/ced336c84434571340c07994e3667a0ee284fefe.1666735451.git.Thinh.Nguyen@xxxxxxxxxxxx/)?__;!!A4F2R9G_pg!fB0krUX6qsOXrBsOHzLstVFWqTLL9ncwGFXxlxK0eedLY_29XdwjXRtbO-EWv1eX5okN1rOeBZspZ21KSb5b$ > > > > > >> UVC Gadget Driver sets the no_interrupt flag for ~3/4 of its queued > > > > > >> usb_request, so an ISOC failure may not immediately interrupt the UVC > > > > > >> gadget driver, leaving more time for the frame to finish encoding. > > > > > >> > > > > > >> I couldn't find any concrete error handling rules in the UVC specs, so > > > > > >> I am not sure what the proper solution here is. To try out, I created > > > > > >> a patch (attached below) that dequeues all queued usb_requests from > > > > > >> the endpoint in case of an ISOC failure and clears the uvc buffer > > > > > >> queue. This eliminated the partial frames with no perceivable frame > > > > > >> drops. > > > > > >> > > > > > >> So my questions here are: > > > > > >> 1. Is this a known issue, and if so are there workarounds for it? > > > > > >> 2. If the answer to above is "No", does the explanation and mitigation > > > > > >> seem reasonable? > > > > > >> > > > > > >> Patch follows (mostly for illustration, I can formalize it if > > > > > >> needed!). It adds a new 'req_inflight' list to track queued > > > > > >> usb_requests that have not been given back to the gadget driver and > > > > > >> drops all the queued requests in case of an ISOC failure. The other > > > > > >> changes are for the extra bookkeeping required to handle dropping all > > > > > >> frames. I haven't been able to confirm it, but as far as I can tell > > > > > >> the issue exists at ToT as well. > > > > > >> > > > > > > > > Perhaps this conversation with Jeff may explain the issues you observed: > > > > https://urldefense.com/v3/__https://lore.kernel.org/all/20221021164349.fft4yqnxuztsqdeu@xxxxxxxxxxxx/__;!!A4F2R9G_pg!eK3VyAq7vX70vA-WJLA6_bPMbZFx0n33wH39JItHzwCNVqKSkN2aG0izF0GZPWqYxkgL-fNinWlIW71HbGGC$ > > > > > > > > To summarized the long conversation, the UVC needs to maintain a > > > > constant queue of usb requests. Each request is scheduled for a specific > > > > interval. If the UVC couldn't keep up feeding requests to the > > > > controller, then we will have stale requests and missed isoc. > > > > > > > > From the discussion and review, the UVC couldn't queue requests fast > > > > enough. The problem is exacerbated when you reduce the interrupt > > > > frequency with no_interrupt setting. The dwc3 driver has some basic > > > > mechanism to detect for underrun by checking if the queue is empty, but > > > > that's not enough to workaround it. > > > > > > > > Your suggestion to dequeue the request would mean the controller driver > > > > will reschedule the isoc transfer again, which reschedule the next > > > > request for a new interval (potentially avoid being stale). That's fine, > > > > but that may not be a great solution because we're still playing catch > > > > up and the missed isoc already happened. > > > > > > > > You may try to do the followings to mitigate this issue: > > > > 1) Make sure UVC to only queuing rate is constant. Perhaps queue dummy > > > > requests in between time gaps if need be? > > > > > > > This is extremely helpful, thank you! I have a somewhat basic > > > question: For an isoc request, is its "freshness" determined from the > > > time at which usb_ep_queue is called, or from the time at which the > > > previous request was transferred to the host? If the "freshness" is > > > determined from the time 'usb_ep_request' was called, I wonder if the > > > "underrun" is artificial because UVC Gadget driver pumps all free > > > usb_requests at once, and then waits around doing nothing while the > > > usb_requests get stale in the controller's queue. In this case, just > > > slowing the encode loop to wait a little before queuing more requests > > > should do effectively what you suggest above? > > > > > > > Here's a simplified version of how it works: > > (Note that I'll be talking/using usb request as if it's TRBs and also > > ignore SG. I will also only focus about the current handling of the > > gadget driver and dwc3 driver.) > > I appreciate you taking the time to explain this. This is very > enlightening, thank you! > > > > > The controller tracks the current uframe. For isoc endpoint, when you > > call usb_ep_queue(), the controller doesn't consume the request right > > away. The request is scheduled for a specific interval. For UVC, an > > interval is a single uframe (125us). As the current uframe keeps > > incrementing, each request on a queue is consumed. If there's no request > > available for the current uframe, the next queued request is considered > > stale or expired. This case is considered underrun. > > > > "freshness" means the request is scheduled for a future uframe. "stale" > > means the request is queued for a past uframe. Isoc data is time > > sensitive. So if the data isn't gone out at a specific time, then it's > > dropped. > > > > The problem with many gadget drivers that use isoc endpoint is that they > > only queue more requests when they get notified that the previous set of > > requests are completed. This creates time gaps which can starve the > > controller of more requests. > > > > We have some basic mechanism in dwc3 to reschedule new requests when > > there's missed isoc and the queue is empty for UVC. However that's not > > enough. The time the controller consumes the request and the time the > > driver handles the interrupt is different. The driver may not know that > > queue is empty until it's already too late. The gadget driver just needs > > to make sure to keep the request queue to at least X amount of requests. > > Note that 125us is relatively fast. > > I see, and I think I understand Greg's previous comment better as > well: The UVC driver isn't falling behind on the video stream, it is > falling behind the usb controller's monotonic isoc stream. > > From what I can see, this leaves us in an interesting place: UVC > allows the host to configure the camera's output resolution and fps, > which effectively controls how fast the camera is generating data. > This is at odds with the UVC gadget driver, which currently packs each > video frame into as few usb_requests as possible (using the full Hm... I would spread the data to more requests and not put all the eggs in one basket. It shouldn't be an issue with dwc3 controller, but some hosts aren't the greatest when handling isoc at high transfer rate. > available size in usb_requests). Effectively, the UVC gadget driver > attempts to use the "full" bandwidth of isoc transfers even when the > camera isn't generating data fast enough. For example, in my > observations: 1 video frame is ~22kB. At 30fps, this represents 1/30 > of the amount of data the camera would generate in a second. This 22kB > is split into 8 usb_requests which is about 1/1000 the number of > requests UVC driver needs to generate per second to prevent isoc > failures (assuming 125us monotonic uframes). Assuming some fudge > factor from the simplifications in your explanation gives the uvc > driver some extra leeway with request queuing, we're still roughly two > order of magnitudes out of sync. Even with perfect 'complete' > callbacks and video frame encodings, an underrun seems inevitable. > Data is being generated at a far slower rate than it is being > transferred. Does this reasoning seem valid? > > Just as a test I'll try updating the UVC driver to consume 266 > usb_requests per video frame (~1/30 of 8000), which should be enough > to keep the usb controller queue occupied for ~1/30s. Ideally, by the > time the controller queue is empty, the camera would have produced a You mean queue 266 requests at once? This may work. There should be a large enough time gap so that the dwc3 driver can properly detect an empty queue to be able to reschedule new requests for the next video frame. > new frame. This doesn't solve the issue with latencies around callback > and an isoc failure might still happen, hopefully the failure > frequency is reduced because UVC queues enough requests per video > frame to not starve the controller's queue while waiting on a new > frame and the only way they go out of sync is from 'complete' callback > timings. I am assuming this has been tried before, but my LKML search > skills are failing and I can't find much on it. > > > > > Note that this talking point is assuming that the UVC host behaving > > properly and poll for data every uframe. If not, it's a different issue. > > > > > > 2) Enhance dwc3 to detect underrun from UVC on request queue. ie. If the > > > > queue is empty and a new request is queue, reschedule the isoc transfer. > > > > This will break some other devices if time guarantee is required. So > > > > perhaps an additional flag is needed for this change of behavior. > > > > > > > > > > I am not as familiar with the DWC3, but just to clarify: Is your > > > suggestion to reschedule the stale requests if the controller's send > > > queue is empty (underrun) and the UVC gadget driver queues a new > > > request? Depending on how many stale requests there are, would that > > > run the risk of timing out the new request as well? > > > > > > > No, rescheduling here means all the queued requests will be scheduled > > for a new future uframe. But also note that the queue empty check is > > insufficient as explained above. So this would only mitigate the issue. > > > > This makes a lot more sense after your explanation, thank you! > np. BR, Thinh