Re: UVC Gadget Driver shows glitched frames with a Linux host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 18, 2023 at 5:19 PM Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> wrote:
>
> On Wed, Apr 19, 2023, Thinh Nguyen wrote:
> > On Tue, Apr 18, 2023, Avichal Rakesh wrote:
> > > On Tue, Apr 18, 2023 at 12:40 PM Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Apr 17, 2023, Avichal Rakesh wrote:
> > > > > On Mon, Apr 17, 2023 at 7:49 PM Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Mon, Apr 17, 2023, Avichal Rakesh wrote:
> > > > > > >
> > > > > > > On 4/17/23 06:49, Greg KH wrote:
> > > > > > > > On Fri, Apr 14, 2023 at 02:03:02PM -0700, Avichal Rakesh wrote:
> > > > > > > >
> > > > > > > >> This problem may be further exaggerated by the DWC3 controller driver
> > > > > > > >> (which is what my device has) not setting the IMI flag when
> > > > > > > >> no_interrupt flag is set
> > > > > > > >> (https://urldefense.com/v3/__https://lore.kernel.org/all/ced336c84434571340c07994e3667a0ee284fefe.1666735451.git.Thinh.Nguyen@xxxxxxxxxxxx/)?__;!!A4F2R9G_pg!fB0krUX6qsOXrBsOHzLstVFWqTLL9ncwGFXxlxK0eedLY_29XdwjXRtbO-EWv1eX5okN1rOeBZspZ21KSb5b$
> > > > > > > >> UVC Gadget Driver sets the no_interrupt flag for ~3/4 of its queued
> > > > > > > >> usb_request, so an ISOC failure may not immediately interrupt the UVC
> > > > > > > >> gadget driver, leaving more time for the frame to finish encoding.
> > > > > > > >>
> > > > > > > >> I couldn't find any concrete error handling rules in the UVC specs, so
> > > > > > > >> I am not sure what the proper solution here is. To try out, I created
> > > > > > > >> a patch (attached below) that dequeues all queued usb_requests from
> > > > > > > >> the endpoint in case of an ISOC failure and clears the uvc buffer
> > > > > > > >> queue. This eliminated the partial frames with no perceivable frame
> > > > > > > >> drops.
> > > > > > > >>
> > > > > > > >> So my questions here are:
> > > > > > > >> 1. Is this a known issue, and if so are there workarounds for it?
> > > > > > > >> 2. If the answer to above is "No", does the explanation and mitigation
> > > > > > > >> seem reasonable?
> > > > > > > >>
> > > > > > > >> Patch follows (mostly for illustration, I can formalize it if
> > > > > > > >> needed!). It adds a new 'req_inflight' list to track queued
> > > > > > > >> usb_requests that have not been given back to the gadget driver and
> > > > > > > >> drops all the queued requests in case of an ISOC failure. The other
> > > > > > > >> changes are for the extra bookkeeping required to handle dropping all
> > > > > > > >> frames. I haven't been able to confirm it, but as far as I can tell
> > > > > > > >> the issue exists at ToT as well.
> > > > > > > >>
> > > > > >
> > > > > > Perhaps this conversation with Jeff may explain the issues you observed:
> > > > > > https://urldefense.com/v3/__https://lore.kernel.org/all/20221021164349.fft4yqnxuztsqdeu@xxxxxxxxxxxx/__;!!A4F2R9G_pg!eK3VyAq7vX70vA-WJLA6_bPMbZFx0n33wH39JItHzwCNVqKSkN2aG0izF0GZPWqYxkgL-fNinWlIW71HbGGC$
> > > > > >
> > > > > > To summarized the long conversation, the UVC needs to maintain a
> > > > > > constant queue of usb requests. Each request is scheduled for a specific
> > > > > > interval. If the UVC couldn't keep up feeding requests to the
> > > > > > controller, then we will have stale requests and missed isoc.
> > > > > >
> > > > > > From the discussion and review, the UVC couldn't queue requests fast
> > > > > > enough. The problem is exacerbated when you reduce the interrupt
> > > > > > frequency with no_interrupt setting. The dwc3 driver has some basic
> > > > > > mechanism to detect for underrun by checking if the queue is empty, but
> > > > > > that's not enough to workaround it.
> > > > > >
> > > > > > Your suggestion to dequeue the request would mean the controller driver
> > > > > > will reschedule the isoc transfer again, which reschedule the next
> > > > > > request for a new interval (potentially avoid being stale). That's fine,
> > > > > > but that may not be a great solution because we're still playing catch
> > > > > > up and the missed isoc already happened.
> > > > > >
> > > > > > You may try to do the followings to mitigate this issue:
> > > > > > 1) Make sure UVC to only queuing rate is constant. Perhaps queue dummy
> > > > > >    requests in between time gaps if need be?
> > > > > >
> > > > > This is extremely helpful, thank you! I have a somewhat basic
> > > > > question: For an isoc request, is its "freshness" determined from the
> > > > > time at which usb_ep_queue is called, or from the time at which the
> > > > > previous request was transferred to the host? If the "freshness" is
> > > > > determined from the time 'usb_ep_request' was called, I wonder if the
> > > > > "underrun" is artificial because UVC Gadget driver pumps all free
> > > > > usb_requests at once, and then waits around doing nothing while the
> > > > > usb_requests get stale in the controller's queue. In this case, just
> > > > > slowing the encode loop to wait a little before queuing more requests
> > > > > should do effectively what you suggest above?
> > > > >
> > > >
> > > > Here's a simplified version of how it works:
> > > > (Note that I'll be talking/using usb request as if it's TRBs and also
> > > > ignore SG. I will also only focus about the current handling of the
> > > > gadget driver and dwc3 driver.)
> > >
> > > I appreciate you taking the time to explain this. This is very
> > > enlightening, thank you!
> > >
> > > >
> > > > The controller tracks the current uframe. For isoc endpoint, when you
> > > > call usb_ep_queue(), the controller doesn't consume the request right
> > > > away. The request is scheduled for a specific interval. For UVC, an
> > > > interval is a single uframe (125us). As the current uframe keeps
> > > > incrementing, each request on a queue is consumed. If there's no request
> > > > available for the current uframe, the next queued request is considered
> > > > stale or expired. This case is considered underrun.
> > > >
> > > > "freshness" means the request is scheduled for a future uframe. "stale"
> > > > means the request is queued for a past uframe. Isoc data is time
> > > > sensitive. So if the data isn't gone out at a specific time, then it's
> > > > dropped.
> > > >
> > > > The problem with many gadget drivers that use isoc endpoint is that they
> > > > only queue more requests when they get notified that the previous set of
> > > > requests are completed. This creates time gaps which can starve the
> > > > controller of more requests.
> > > >
> > > > We have some basic mechanism in dwc3 to reschedule new requests when
> > > > there's missed isoc and the queue is empty for UVC. However that's not
> > > > enough. The time the controller consumes the request and the time the
> > > > driver handles the interrupt is different. The driver may not know that
> > > > queue is empty until it's already too late. The gadget driver just needs
> > > > to make sure to keep the request queue to at least X amount of requests.
> > > > Note that 125us is relatively fast.
> > >
> > > I see, and I think I understand Greg's previous comment better as
> > > well: The UVC driver isn't falling behind on the video stream, it is
> > > falling behind the usb controller's monotonic isoc stream.
> > >
> > > From what I can see, this leaves us in an interesting place: UVC
> > > allows the host to configure the camera's output resolution and fps,
> > > which effectively controls how fast the camera is generating data.
> > > This is at odds with the UVC gadget driver, which currently packs each
> > > video frame into as few usb_requests as possible (using the full
> >
> > Hm... I would spread the data to more requests and not put all the eggs
> > in one basket. It shouldn't be an issue with dwc3 controller, but some
> > hosts aren't the greatest when handling isoc at high transfer rate.
> >
> > > available size in usb_requests). Effectively, the UVC gadget driver
> > > attempts to use the "full" bandwidth of isoc transfers even when the
> > > camera isn't generating data fast enough. For example, in my
> > > observations: 1 video frame is ~22kB. At 30fps, this represents 1/30
> > > of the amount of data the camera would generate in a second. This 22kB
> > > is split into 8 usb_requests which is about 1/1000 the number of
> > > requests UVC driver needs to generate per second to prevent isoc
> > > failures (assuming 125us monotonic uframes). Assuming some fudge
> > > factor from the simplifications in your explanation gives the uvc
> > > driver some extra leeway with request queuing, we're still roughly two
> > > order of magnitudes out of sync. Even with perfect 'complete'
> > > callbacks and video frame encodings, an underrun seems inevitable.
> > > Data is being generated at a far slower rate than it is being
> > > transferred. Does this reasoning seem valid?
> > >
> > > Just as a test I'll try updating the UVC driver to consume 266
> > > usb_requests per video frame (~1/30 of 8000), which should be enough
> > > to keep the usb controller queue occupied for ~1/30s. Ideally, by the
> > > time the controller queue is empty, the camera would have produced a
> >
> > You mean queue 266 requests at once? This may work. There should be a
> > large enough time gap so that the dwc3 driver can properly detect an
> > empty queue to be able to reschedule new requests for the next video
> > frame.
>
> That's a lot of requests to allocate. Can't you just make sure to notify
> the gadget driver more often to requeue requests and don't set
> no_interrupt as often?

Oh yes, I am hoping UVC gadget driver and the usb controller can reach
a steady state of pumping out requests with less than 266 requests
allocated. I will play around and see how many requests it takes to
reach the steady state. I think the encode loop can maintain the isoc
deadlines as long as there are free requests available.

>
> >
> > > new frame. This doesn't solve the issue with latencies around callback
> > > and an isoc failure might still happen, hopefully the failure
> > > frequency is reduced because UVC queues enough requests per video
> > > frame to not starve the controller's queue while waiting on a new
> > > frame and the only way they go out of sync is from 'complete' callback
> > > timings. I am assuming this has been tried before, but my LKML search
> > > skills are failing and I can't find much on it.
> > >
> > > >
>
> BR,
> Thinh



-- 
- Avi.




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux