Re: uvc gadget: Making upper bound of number of usb requests allocated configurable through configfs

Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx> · Fri, 27 Oct 2023 11:12:23 +0300

On Wed, Oct 18, 2023 at 03:28:57PM +0200, Michael Grzeschik wrote:
> On Sun, Oct 15, 2023 at 09:33:43PM -0700, Jayant Chowdhary wrote:
> >On 10/12/23 11:50, Thinh Nguyen wrote:
> >> On Mon, Oct 09, 2023, Jayant Chowdhary wrote:
> >>>> On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote:
> >>>>> We had been seeing the UVC gadget driver receive isoc errors while
> >>>>> sending packets to the usb endpoint - resulting in glitches being shown
> >>>>> on linux hosts. My colleague Avichal Rakesh and others had a very
> >>>>> enlightening discussion at
> >>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@xxxxxxxxxx/T/__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gQ73n_-Y$
> >>>>>
> >>>>>
> >>>>> The conclusion that we came to was : usb requests with actual uvc frame
> >>>>> data were missing their scheduled uframes in the usb controller. As a
> >>>>> mitigation, we started sending 0 length usb requests when there was no
> >>>>> uvc frame buffer available to get data from. Even with this mitigation,
> >>>>> we are seeing glitches - albeit at a lower frequency.
> >>>>>
> >>>>> After some investigation, it is seen that we’re getting isoc errors when
> >>>>> the worker thread serving video_pump() work items, doesn’t get scheduled
> >>>>> for longer periods of time - than usual - in most cases > 6ms.
> >>>>> This is close enough to the 8ms limit that we have when the number of usb
> >>>>> requests in the queue is 64 (since we have a 125us uframe period). In order
> >>>>> to tolerate the scheduling delays better, it helps to increase the number of
> >>>>> usb requests in the queue . In that case, we have more 0 length requests
> >>>>> given to the udc driver - and as a result we can wait longer for uvc
> >>>>> frames with valid data to get processed by video_pump(). I’m attaching a
> >>>>> patch which lets one configure the upper bound on the number of usb
> >>>>> requests allocated through configfs. Please let me know your thoughts.
> >>>>> I can formalize  the patch if it looks okay.
> >>>> Why do you want to limit the upper bound?  Why not just not submit so
> >>>> many requests from userspace as you control that, right?
> >>>
> >>> Userspace negotiates a video frame rate (typically 30/60fps) with the host that does
> >>> not necessarily correspond to the ISOC cadence. After the
> >>> patch at https://urldefense.com/v3/__https://lkml.org/lkml/diff/2023/5/8/1115/1__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gWbb9bvy$  was submitted, we are
> >>> maintaining back pressure on the usb controller even if we do not have uvc frame
> >>> data, by sending the controller 0 length requests (as long as usb_requests are
> >>> available). Also, even if the userspace application were to somehow produce
> >>> data to match the ISOC rate, it would  need to have information about USB
> >>> timing details - which I am not sure is available to userspace or is the right
> >>> thing to do here ?
> >>>
> >>> Here, we are trying to handle the scenario in which the video_pump() worker
> >>> thread does not get scheduled in time - by increasing the number of usb requests
> >>> allocated in the queue. This would send more usb requests to the usb controller,
> >>> when video_pump() does get scheduled - even if they're 0 length. This buys
> >>> the video_pump() worker thread scheduling time -since more usb requests
> >>> are with the controller, subsequent requests sent will not be 'stale' and
> >>> dropped by the usb controller.
> >>>
> >> I believe you're testing against dwc3 controller right? I may not be as
> >> familiar with UVC function driver, but based on the previous
> >> discussions, I think the driver should be able to handle this without
> >> the user input.
> >
> > Yes we are testing against the dwc3 controller.
> >
> >> The frequency of the request submission should not depend on the
> >> video_pump() work thread since it can vary. The frequency of request
> >> submission should match with the request completion. We know that
> >> request completion rate should be fixed (1 uframe/request + when you
> >> don't set no_interrupt). Base on this you can do your calculation on how
> >> often you should set no_interrupt and how many requests you must submit.
> >> You don't have to wait for the video_pump() to submit 0-length requests.
> >>
> >> The only variable here is the completion handler delay or system
> >> latency, which should not be much and should be within your calculation.
> >
> > Thanks for the suggestion. It indeed makes sense that we do not completely depend on
> > video_pump() for sending 0 length requests. I was concerned about
> > synchronization needed when we send requests to the dwc3 controller from
> > different threads. I see that the dwc3 controller code does internally serialize
> > queueing requests, can we expect this from other controllers as well ?
> >
> > This brings me to another question for Michael - I see
> > that we introduced a worker thread for pumping  usb requests to the usb endpoint
> > in https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@xxxxxxxxxxxxxx/
> > (I see multiple email addresses, so apologies if I used the incorrect one).
> >
> > Did we introduce the worker thread to solve some specific deadlock scenarios ?
> 
> Exactly. This was the reason why we moved to the pump worker.

Correct, but it also has the side effect of lowering the amount of time
spent in interrupt context. It would be potentially problematic to
revert to copying in the completion handler.

> I actually
> looked into the host side implementation of the uvc driver. There we
> also queue an worker from the complete function:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_video.c#n1646

This was implemented for the sole purpose of improving performance, as
the memcpy was too much for a single CPU to handle.

> So this sounded reasonable to me. However we faced similar issues like
> you and introduced different ways to improve the latency issue.
> 
> One thing we did was improving the latency by adding WQ_HIGHPRI
> 
> https://lore.kernel.org/linux-usb/20220907215818.2670097-1-m.grzeschik@xxxxxxxxxxxxxx/
> 
> Another patch here is also adding WQ_CPU_INTENSIVE.
> 
> But, after all the input from Thinh it is probably better to solve the
> issue in a more reliable way.
> 
> >Or was it a general mitigation against racy usb request submission from v4l2 buffer
> >queuing, stream enable and the video complete handler firing ?
> 
> I don't remember all of the issues we saw back then. But this is also an very
> likely scenario.
> 
> > I was chatting with Avi about this, what if we submit requests to the endpoint
> > only at two points in the streaming lifecycle -
> > 1) The whole 64 (or however many usb requests are allocated) when
> >    uvcg_video_enable() is called - with 0 length usb_requests.
> > 2) In the video complete handler - if a video buffer is available, we encode it
> >    and submit it to the endpoint. If not, we send a 0 length request.
> 
> It really sounds like a good idea.
> 
> > This way we're really maintaining back pressure and sending requests as soon
> > as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
> > so hopefully not too heavy on the interrupt handler. I will work on prototyping
> > this meanwhile.

-- 
Regards,

Laurent Pinchart