Re: UVC Gadget Driver shows glitched frames with a Linux host

Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> · Tue, 9 May 2023 22:35:36 +0000

On Tue, May 09, 2023, Avichal Rakesh wrote:
> On Mon, May 8, 2023 at 5:21 PM Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> wrote:
> >
> > On Mon, May 08, 2023, Avichal Rakesh wrote:
> > > On Sat, May 6, 2023 at 5:53 AM Laurent Pinchart
> > > <laurent.pinchart@xxxxxxxxxxxxxxxx> wrote:
> > >
> > > > > The first thing I tried was to split one video frame over 266 frames, without
> > > > > changing the number of requests allocated. And it works! However, as Laurent
> > > > > mentioned, it does add a fair amount of bookkeeping to split a video frame into
> > > > > the required number of requests. I also hardcoded the number 266 from our
> > > > > discussion, but I am not sure how to figure out that number dynamically. 266
> > > > > also didn't work if the host started sending frames at more than 30fps :/, so
> > > > > our dynamic calculation would need to take camera's real output fps into
> > > > > account, which as far as I can tell is not known to the UVC driver.
> > > >
> > > > It would probably need to monitor how full the request queue is, and
> > > > adapt the number of bytes it queues in each request accordingly. That's
> > > > indeed quite a bit of work, for little gain compared to the option you
> > > > describe below.
> > > >
> > > Agreed, especially if the hosts already handle 0 length packets.
> > > As long as the usb controllers can keep up, the burst approach seems
> > > more reasonable.
> > >
> > > > > With those issues I tried what Laurent called the "burst" approach
> > > > > (attached below), i.e. send the video frames in as few packets as possible,
> > > > > and then queue up 0 length packets to keep the ISOC queue happy. This approach
> > > > > works perfectly as far as I can tell. Locally I tried with a Linux, Window,
> > > > > and MacOS host with no frame drops or ISOC failures on any of them!
> > > > >
> > > > > In the current patch, UVC gadget driver keeps the ISOC cadence by effectively
> > > > > maintaining a back-pressure on the USB controller (at least to the best of its
> > > > > capabilities). Any usb_request available to the UVC gadget gets immediately
> > > > > queued back to the USB controller. If a video frame is available, the frame is
> > > > > encoded, if not, the length is set to 0. The idea being that the host's polling
> > > > > and the controller's 'complete' callback will result in a somewhat consistent
> > > > > cadence for the uvc driver after the initial burst of packets.
> > > > >
> > > > > However this does mean that at worst, the new video frames are up to 63
> > > > > usb_requests behind, but assuming a 125us per usb_request, that amounts to ~8ms
> > > > > latency at the worst, which seems acceptable?
> > > >
> > > > There's a trade off between latency and the risk of underruns. We could
> > > > decrease the number of queued requests to lower the latency, as long as
> > > > we ensure the margin is high enough to avoid underruns in higher load
> > > > conditions. We could also do so only when queuing 0-size requests, and
> > > > queue the data in burst mode with a higher number of requests.
> > >
> > > Would 8ms of latency be considered significant? Unless the host asks
> > > for >125fps,
> > > that amounts to less than a frame of latency, so frames should not be dropped
> > > by the host for being "late". Admittedly, I don't know enough about UVC usage to
> > > say if 8ms (at worst) will be problematic for certain usages. The
> > > hosts don't seem to
> > > have any issues when streaming at <=60fps.
> > >
> >
> > Do you only queue a single burst at a time? We don't have to queue all
> > 63 0-length requests as a single "burst" right? We can queue multiple
> > smaller bursts of 0-length requests. This way, the UVC driver can be
> > interrupted more often, reducing the video data latency if it arrives
> > earlier. This keeps the total number of queued requests smaller while
> > ensuring that the controller isn't starved of requests (as long as this
> > smaller burst accounts for the system latency). The tradeoff here is
> > just the UVC gadget driver needs to handle request completion a little
> > more often. However, we are less likely to be underrun no matter the
> > video's fps.
> 
> The patch does currently queue all 64 0-length packets from the start,
> and then relies on completion callbacks to maintain a steady state. It
> still sets the `no_interrupt` flag, so the completion callback
> interrupts every ~16th request (at worst) which is the same as before.

I see, that's good. I thought there's only 1 out of 64 requests will
have completion interrupt.

> Ideally, the UVC driver never sits on a video frame for longer than 16
> requests, but it does keep the controller queue as full as it can,
> which means the video frame could be stuck behind a bunch of 0-length
> packets and could hypothetically be sent faster if we were to be
> smarter with queuing the 0-length requests. I say hypothetically,
> because I have been unable to capture any latency differences with and
> without the patch.
> 
> I've been trying to think of some way to account for system latency
> and only queue a smaller set of 0 length packets, but most of them
> boil down to "hope we have enough requests queued". It would be
> helpful if we could pin some numbers down that can be relied on by the
> gadget driver. For example: Can 125us per request be safely assumed;
> or is there a way to get information about how long a usb_request will
> last for? It would be very helpful if there was a way to query the
> controller's active queue, but I couldn't really find anything in the
> APIs. We can't really rely on the complete callbacks because of the
> "no_interrupt" flag. I'll try out some things, and get back. If you
> have any ideas, I would love to hear them too!
> 

If "no_interrupt" is set, it just means the gadget driver will not get a
notification of the request completion. It may help to think in term of
the moving current uframe since it's predictable and steadily
progressing. Each queued request is earmarked for a future uframe. Once
the current uframe matches the scheduled uframe of the request, the
request is completed regardless there's a missed isoc or whether the
no_interrupt is set (granted it's not stale).

The system latency introduces delay in the interrupt handler and the
handling of it. In other words, we *know* the minimum time for the
request completion, but we can't guarantee the time the gadget driver
would receive and finish handling the request completion interrupt. This
varies between different setups. An error due to system latency should
not be a frequent occurrence. As long as it's within an acceptable
threshold, it should be fine. I think your 8ms delay is fine, possibly
4ms may be more than enough. I haven't had to deal with more than 3ms
delay in our hardware testings (with multiple endpoints and devices).

I hope this info is new and useful, otherwise I may sound like a broken
record.

Note: we can always enhance UVC again should anyone report new issue
in their testing once your change is upstreamed.

BR,
Thinh