On Wed, Apr 19, 2023, Thinh Nguyen wrote: > On Tue, Apr 18, 2023, Avichal Rakesh wrote: > > On Tue, Apr 18, 2023 at 12:40 PM Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> wrote: > > > > > > On Mon, Apr 17, 2023, Avichal Rakesh wrote: > > > > On Mon, Apr 17, 2023 at 7:49 PM Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> wrote: > > > > > > > > > > On Mon, Apr 17, 2023, Avichal Rakesh wrote: > > > > > > > > > > > > On 4/17/23 06:49, Greg KH wrote: > > > > > > > On Fri, Apr 14, 2023 at 02:03:02PM -0700, Avichal Rakesh wrote: > > > > > > > > > > > > > >> This problem may be further exaggerated by the DWC3 controller driver > > > > > > >> (which is what my device has) not setting the IMI flag when > > > > > > >> no_interrupt flag is set > > > > > > >> (https://urldefense.com/v3/__https://lore.kernel.org/all/ced336c84434571340c07994e3667a0ee284fefe.1666735451.git.Thinh.Nguyen@xxxxxxxxxxxx/)?__;!!A4F2R9G_pg!fB0krUX6qsOXrBsOHzLstVFWqTLL9ncwGFXxlxK0eedLY_29XdwjXRtbO-EWv1eX5okN1rOeBZspZ21KSb5b$ > > > > > > >> UVC Gadget Driver sets the no_interrupt flag for ~3/4 of its queued > > > > > > >> usb_request, so an ISOC failure may not immediately interrupt the UVC > > > > > > >> gadget driver, leaving more time for the frame to finish encoding. > > > > > > >> > > > > > > >> I couldn't find any concrete error handling rules in the UVC specs, so > > > > > > >> I am not sure what the proper solution here is. To try out, I created > > > > > > >> a patch (attached below) that dequeues all queued usb_requests from > > > > > > >> the endpoint in case of an ISOC failure and clears the uvc buffer > > > > > > >> queue. This eliminated the partial frames with no perceivable frame > > > > > > >> drops. > > > > > > >> > > > > > > >> So my questions here are: > > > > > > >> 1. Is this a known issue, and if so are there workarounds for it? > > > > > > >> 2. If the answer to above is "No", does the explanation and mitigation > > > > > > >> seem reasonable? > > > > > > >> > > > > > > >> Patch follows (mostly for illustration, I can formalize it if > > > > > > >> needed!). It adds a new 'req_inflight' list to track queued > > > > > > >> usb_requests that have not been given back to the gadget driver and > > > > > > >> drops all the queued requests in case of an ISOC failure. The other > > > > > > >> changes are for the extra bookkeeping required to handle dropping all > > > > > > >> frames. I haven't been able to confirm it, but as far as I can tell > > > > > > >> the issue exists at ToT as well. > > > > > > >> > > > > > > > > > > Perhaps this conversation with Jeff may explain the issues you observed: > > > > > https://urldefense.com/v3/__https://lore.kernel.org/all/20221021164349.fft4yqnxuztsqdeu@xxxxxxxxxxxx/__;!!A4F2R9G_pg!eK3VyAq7vX70vA-WJLA6_bPMbZFx0n33wH39JItHzwCNVqKSkN2aG0izF0GZPWqYxkgL-fNinWlIW71HbGGC$ > > > > > > > > > > To summarized the long conversation, the UVC needs to maintain a > > > > > constant queue of usb requests. Each request is scheduled for a specific > > > > > interval. If the UVC couldn't keep up feeding requests to the > > > > > controller, then we will have stale requests and missed isoc. > > > > > > > > > > From the discussion and review, the UVC couldn't queue requests fast > > > > > enough. The problem is exacerbated when you reduce the interrupt > > > > > frequency with no_interrupt setting. The dwc3 driver has some basic > > > > > mechanism to detect for underrun by checking if the queue is empty, but > > > > > that's not enough to workaround it. > > > > > > > > > > Your suggestion to dequeue the request would mean the controller driver > > > > > will reschedule the isoc transfer again, which reschedule the next > > > > > request for a new interval (potentially avoid being stale). That's fine, > > > > > but that may not be a great solution because we're still playing catch > > > > > up and the missed isoc already happened. > > > > > > > > > > You may try to do the followings to mitigate this issue: > > > > > 1) Make sure UVC to only queuing rate is constant. Perhaps queue dummy > > > > > requests in between time gaps if need be? > > > > > > > > > This is extremely helpful, thank you! I have a somewhat basic > > > > question: For an isoc request, is its "freshness" determined from the > > > > time at which usb_ep_queue is called, or from the time at which the > > > > previous request was transferred to the host? If the "freshness" is > > > > determined from the time 'usb_ep_request' was called, I wonder if the > > > > "underrun" is artificial because UVC Gadget driver pumps all free > > > > usb_requests at once, and then waits around doing nothing while the > > > > usb_requests get stale in the controller's queue. In this case, just > > > > slowing the encode loop to wait a little before queuing more requests > > > > should do effectively what you suggest above? > > > > > > > > > > Here's a simplified version of how it works: > > > (Note that I'll be talking/using usb request as if it's TRBs and also > > > ignore SG. I will also only focus about the current handling of the > > > gadget driver and dwc3 driver.) > > > > I appreciate you taking the time to explain this. This is very > > enlightening, thank you! > > > > > > > > The controller tracks the current uframe. For isoc endpoint, when you > > > call usb_ep_queue(), the controller doesn't consume the request right > > > away. The request is scheduled for a specific interval. For UVC, an > > > interval is a single uframe (125us). As the current uframe keeps > > > incrementing, each request on a queue is consumed. If there's no request > > > available for the current uframe, the next queued request is considered > > > stale or expired. This case is considered underrun. > > > > > > "freshness" means the request is scheduled for a future uframe. "stale" > > > means the request is queued for a past uframe. Isoc data is time > > > sensitive. So if the data isn't gone out at a specific time, then it's > > > dropped. > > > > > > The problem with many gadget drivers that use isoc endpoint is that they > > > only queue more requests when they get notified that the previous set of > > > requests are completed. This creates time gaps which can starve the > > > controller of more requests. > > > > > > We have some basic mechanism in dwc3 to reschedule new requests when > > > there's missed isoc and the queue is empty for UVC. However that's not > > > enough. The time the controller consumes the request and the time the > > > driver handles the interrupt is different. The driver may not know that > > > queue is empty until it's already too late. The gadget driver just needs > > > to make sure to keep the request queue to at least X amount of requests. > > > Note that 125us is relatively fast. > > > > I see, and I think I understand Greg's previous comment better as > > well: The UVC driver isn't falling behind on the video stream, it is > > falling behind the usb controller's monotonic isoc stream. > > > > From what I can see, this leaves us in an interesting place: UVC > > allows the host to configure the camera's output resolution and fps, > > which effectively controls how fast the camera is generating data. > > This is at odds with the UVC gadget driver, which currently packs each > > video frame into as few usb_requests as possible (using the full > > Hm... I would spread the data to more requests and not put all the eggs > in one basket. It shouldn't be an issue with dwc3 controller, but some > hosts aren't the greatest when handling isoc at high transfer rate. > > > available size in usb_requests). Effectively, the UVC gadget driver > > attempts to use the "full" bandwidth of isoc transfers even when the > > camera isn't generating data fast enough. For example, in my > > observations: 1 video frame is ~22kB. At 30fps, this represents 1/30 > > of the amount of data the camera would generate in a second. This 22kB > > is split into 8 usb_requests which is about 1/1000 the number of > > requests UVC driver needs to generate per second to prevent isoc > > failures (assuming 125us monotonic uframes). Assuming some fudge > > factor from the simplifications in your explanation gives the uvc > > driver some extra leeway with request queuing, we're still roughly two > > order of magnitudes out of sync. Even with perfect 'complete' > > callbacks and video frame encodings, an underrun seems inevitable. > > Data is being generated at a far slower rate than it is being > > transferred. Does this reasoning seem valid? > > > > Just as a test I'll try updating the UVC driver to consume 266 > > usb_requests per video frame (~1/30 of 8000), which should be enough > > to keep the usb controller queue occupied for ~1/30s. Ideally, by the > > time the controller queue is empty, the camera would have produced a > > You mean queue 266 requests at once? This may work. There should be a > large enough time gap so that the dwc3 driver can properly detect an > empty queue to be able to reschedule new requests for the next video > frame. That's a lot of requests to allocate. Can't you just make sure to notify the gadget driver more often to requeue requests and don't set no_interrupt as often? > > > new frame. This doesn't solve the issue with latencies around callback > > and an isoc failure might still happen, hopefully the failure > > frequency is reduced because UVC queues enough requests per video > > frame to not starve the controller's queue while waiting on a new > > frame and the only way they go out of sync is from 'complete' callback > > timings. I am assuming this has been tried before, but my LKML search > > skills are failing and I can't find much on it. > > > > > BR, Thinh