Hi Michael, On 10/19/23 16:15, Michael Grzeschik wrote: > On Wed, Oct 18, 2023 at 03:28:57PM +0200, Michael Grzeschik wrote: >> On Sun, Oct 15, 2023 at 09:33:43PM -0700, Jayant Chowdhary wrote: >>> On 10/12/23 11:50, Thinh Nguyen wrote: >>>> On Mon, Oct 09, 2023, Jayant Chowdhary wrote: >>>>>> On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote: >>>>>>> We had been seeing the UVC gadget driver receive isoc errors while >>>>>>> sending packets to the usb endpoint - resulting in glitches being shown >>>>>>> on linux hosts. My colleague Avichal Rakesh and others had a very >>>>>>> enlightening discussion at >>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@xxxxxxxxxx/T/__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gQ73n_-Y$ >>>>>>> >>>>>>> >>>>>>> The conclusion that we came to was : usb requests with actual uvc frame >>>>>>> data were missing their scheduled uframes in the usb controller. As a >>>>>>> mitigation, we started sending 0 length usb requests when there was no >>>>>>> uvc frame buffer available to get data from. Even with this mitigation, >>>>>>> we are seeing glitches - albeit at a lower frequency. >>>>>>> >>>>>>> After some investigation, it is seen that we’re getting isoc errors when >>>>>>> the worker thread serving video_pump() work items, doesn’t get scheduled >>>>>>> for longer periods of time - than usual - in most cases > 6ms. >>>>>>> This is close enough to the 8ms limit that we have when the number of usb >>>>>>> requests in the queue is 64 (since we have a 125us uframe period). In order >>>>>>> to tolerate the scheduling delays better, it helps to increase the number of >>>>>>> usb requests in the queue . In that case, we have more 0 length requests >>>>>>> given to the udc driver - and as a result we can wait longer for uvc >>>>>>> frames with valid data to get processed by video_pump(). I’m attaching a >>>>>>> patch which lets one configure the upper bound on the number of usb >>>>>>> requests allocated through configfs. Please let me know your thoughts. >>>>>>> I can formalize the patch if it looks okay. >>>>>> Why do you want to limit the upper bound? Why not just not submit so >>>>>> many requests from userspace as you control that, right? >>>>> >>>>> Userspace negotiates a video frame rate (typically 30/60fps) with the host that does >>>>> not necessarily correspond to the ISOC cadence. After the >>>>> patch at https://urldefense.com/v3/__https://lkml.org/lkml/diff/2023/5/8/1115/1__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gWbb9bvy$ ; was submitted, we are >>>>> maintaining back pressure on the usb controller even if we do not have uvc frame >>>>> data, by sending the controller 0 length requests (as long as usb_requests are >>>>> available). Also, even if the userspace application were to somehow produce >>>>> data to match the ISOC rate, it would need to have information about USB >>>>> timing details - which I am not sure is available to userspace or is the right >>>>> thing to do here ? >>>>> >>>>> Here, we are trying to handle the scenario in which the video_pump() worker >>>>> thread does not get scheduled in time - by increasing the number of usb requests >>>>> allocated in the queue. This would send more usb requests to the usb controller, >>>>> when video_pump() does get scheduled - even if they're 0 length. This buys >>>>> the video_pump() worker thread scheduling time -since more usb requests >>>>> are with the controller, subsequent requests sent will not be 'stale' and >>>>> dropped by the usb controller. >>>>> >>>> I believe you're testing against dwc3 controller right? I may not be as >>>> familiar with UVC function driver, but based on the previous >>>> discussions, I think the driver should be able to handle this without >>>> the user input. >>> >>> Yes we are testing against the dwc3 controller. >>> >>>> >>>> The frequency of the request submission should not depend on the >>>> video_pump() work thread since it can vary. The frequency of request >>>> submission should match with the request completion. We know that >>>> request completion rate should be fixed (1 uframe/request + when you >>>> don't set no_interrupt). Base on this you can do your calculation on how >>>> often you should set no_interrupt and how many requests you must submit. >>>> You don't have to wait for the video_pump() to submit 0-length requests. >>>> >>>> The only variable here is the completion handler delay or system >>>> latency, which should not be much and should be within your calculation. >>> >>> >>> Thanks for the suggestion. It indeed makes sense that we do not completely depend on >>> video_pump() for sending 0 length requests. I was concerned about >>> synchronization needed when we send requests to the dwc3 controller from >>> different threads. I see that the dwc3 controller code does internally serialize >>> queueing requests, can we expect this from other controllers as well ? >>> >>> This brings me to another question for Michael - I see >>> that we introduced a worker thread for pumping usb requests to the usb endpoint >>> in https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@xxxxxxxxxxxxxx/ >>> (I see multiple email addresses, so apologies if I used the incorrect one). >>> >>> Did we introduce the worker thread to solve some specific deadlock scenarios ? >> >> Exactly. This was the reason why we moved to the pump worker. I actually >> looked into the host side implementation of the uvc driver. There we >> also queue an worker from the complete function: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_video.c#n1646 >> >> So this sounded reasonable to me. However we faced similar issues like >> you and introduced different ways to improve the latency issue. >> >> One thing we did was improving the latency by adding WQ_HIGHPRI >> >> https://lore.kernel.org/linux-usb/20220907215818.2670097-1-m.grzeschik@xxxxxxxxxxxxxx/ >> >> Another patch here is also adding WQ_CPU_INTENSIVE. >> >> But, after all the input from Thinh it is probably better to solve the >> issue in a more reliable way. >> >>> Or was it a general mitigation against racy usb request submission from v4l2 buffer >>> queuing, stream enable and the video complete handler firing ? >> >> I don't remember all of the issues we saw back then. But this is also an very >> likely scenario. >> >>> I was chatting with Avi about this, what if we submit requests to the endpoint >>> only at two points in the streaming lifecycle - >>> 1) The whole 64 (or however many usb requests are allocated) when >>> uvcg_video_enable() is called - with 0 length usb_requests. >>> 2) In the video complete handler - if a video buffer is available, we encode it >>> and submit it to the endpoint. If not, we send a 0 length request. >> >> It really sounds like a good idea. >> >>> This way we're really maintaining back pressure and sending requests as soon >>> as we can to the dwc3 controller. Encoding is mostly memcpys from what I see >>> so hopefully not too heavy on the interrupt handler. I will work on prototyping >>> this meanwhile. > > [1] https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@xxxxxxxxxx/T/#t > > It was actually not that hard to do that. > With the patches from this thread applied [1] , the unformal changes looks like this: > > #change 1 > > diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c > index f64d03136c5665..29cd23c38eb99d 100644 > --- a/drivers/usb/gadget/function/uvc_v4l2.c > +++ b/drivers/usb/gadget/function/uvc_v4l2.c > @@ -626,8 +626,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b) > if (ret < 0) > return ret; > > - if (uvc->state == UVC_STATE_STREAMING) > - queue_work(video->async_wq, &video->pump); > + uvcg_video_pump(video); > > return ret; > } > diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c > index 2ec51ed5e9d074..2fe800500c88a3 100644 > --- a/drivers/usb/gadget/function/uvc_video.c > +++ b/drivers/usb/gadget/function/uvc_video.c > @@ -329,7 +329,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req) > */ > if (video->is_enabled) { > list_add_tail(&req->list, &video->req_free); > - queue_work(video->async_wq, &video->pump); > + spin_unlock_irqrestore(&video->req_lock, flags); > + uvcg_video_pump(video); > + return; > } else { > uvc_video_free_request(ureq, ep); > } > @@ -413,9 +415,8 @@ uvc_video_alloc_requests(struct uvc_video *video) > * This function fills the available USB requests (listed in req_free) with > * video data from the queued buffers. > */ > -static void uvcg_video_pump(struct work_struct *work) > +int uvcg_video_pump(struct uvc_video *video) > { > - struct uvc_video *video = container_of(work, struct uvc_video, pump); > struct uvc_video_queue *queue = &video->queue; > /* video->max_payload_size is only set when using bulk transfer */ > bool is_bulk = video->max_payload_size; > @@ -427,7 +428,7 @@ static void uvcg_video_pump(struct work_struct *work) > > while(true) { > if (!video->ep->enabled) > - return; > + return 0; > > /* > * Check is_enabled and retrieve the first available USB > @@ -436,7 +437,7 @@ static void uvcg_video_pump(struct work_struct *work) > spin_lock_irqsave(&video->req_lock, flags); > if (!video->is_enabled || list_empty(&video->req_free)) { > spin_unlock_irqrestore(&video->req_lock, flags); > - return; > + return -EBUSY; > } > req = list_first_entry(&video->req_free, struct usb_request, > list); > @@ -513,7 +514,7 @@ static void uvcg_video_pump(struct work_struct *work) > } > > if (!req) > - return; > + return 0; > > spin_lock_irqsave(&video->req_lock, flags); > if (video->is_enabled) > @@ -521,6 +522,8 @@ static void uvcg_video_pump(struct work_struct *work) > else > uvc_video_free_request(req->context, video->ep); > spin_unlock_irqrestore(&video->req_lock, flags); > + > + return 0; > } > > /* > @@ -554,7 +557,6 @@ uvcg_video_disable(struct uvc_video *video) > } > spin_unlock_irqrestore(&video->req_lock, flags); > > - cancel_work_sync(&video->pump); > uvcg_queue_cancel(&video->queue, 0); > > spin_lock_irqsave(&video->req_lock, flags); > @@ -635,8 +637,6 @@ int uvcg_video_enable(struct uvc_video *video, int enable) > > video->req_int_count = 0; > > - queue_work(video->async_wq, &video->pump); > - > return ret; > } > > @@ -649,12 +649,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc) > INIT_LIST_HEAD(&video->ureqs); > INIT_LIST_HEAD(&video->req_free); > spin_lock_init(&video->req_lock); > - INIT_WORK(&video->pump, uvcg_video_pump); > - > - /* Allocate a work queue for asynchronous video pump handler. */ > - video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI | WQ_CPU_INTENSIVE, 0); > - if (!video->async_wq) > - return -EINVAL; > > video->uvc = uvc; > > diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h > index 03adeefa343b71..322b05da43965f 100644 > --- a/drivers/usb/gadget/function/uvc_video.h > +++ b/drivers/usb/gadget/function/uvc_video.h > @@ -16,6 +16,8 @@ struct uvc_video; > > int uvcg_video_enable(struct uvc_video *video, int enable); > > +int uvcg_video_pump(struct uvc_video *video); > + > int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc); > > #endif /* __UVC_VIDEO_H__ */ > > Thank you for this. I made some slight modifications (nothing functional) and applied this. I'm actually seeing that the flickers completely disappear on the device that I'm testing. >From around a flicker every couple of minutes to none in 20 minutes. What I did keep was the 0 length request submission, since the camera is naturally producing data at a much lower rate than what the usb controller expects. Is there a reason we would want to remove that code ? > #change 2 > > Also if you would like to revert the zero request generation apply this ontop. > > diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c > index 82695a2ff39aa3..2a3c87079c548d 100644 > --- a/drivers/usb/gadget/function/uvc_video.c > +++ b/drivers/usb/gadget/function/uvc_video.c > @@ -418,12 +418,9 @@ uvc_video_alloc_requests(struct uvc_video *video) > int uvcg_video_pump(struct uvc_video *video) > { > struct uvc_video_queue *queue = &video->queue; > - /* video->max_payload_size is only set when using bulk transfer */ > - bool is_bulk = video->max_payload_size; > struct usb_request *req = NULL; > struct uvc_buffer *buf; > unsigned long flags; > - bool buf_done; > int ret; > > while(true) { > @@ -450,28 +447,13 @@ int uvcg_video_pump(struct uvc_video *video) > */ > spin_lock_irqsave(&queue->irqlock, flags); > buf = uvcg_queue_head(queue); > - > - if (buf != NULL) { > - video->encode(req, video, buf); > - buf_done = buf->state == UVC_BUF_STATE_DONE; > - } else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) { > - /* > - * No video buffer available; the queue is still connected and > - * we're transferring over ISOC. Queue a 0 length request to > - * prevent missed ISOC transfers. > - */ > - req->length = 0; > - buf_done = false; > - } else { > - /* > - * Either the queue has been disconnected or no video buffer > - * available for bulk transfer. Either way, stop processing > - * further. > - */ > + if (buf == NULL) { > spin_unlock_irqrestore(&queue->irqlock, flags); > break; > } > > + video->encode(req, video, buf); > + > /* > * With USB3 handling more requests at a higher speed, we can't > * afford to generate an interrupt for every request. Decide to > @@ -490,7 +472,8 @@ int uvcg_video_pump(struct uvc_video *video) > * indicated by video->uvc_num_requests), as a trade-off > * between latency and interrupt load. > */ > - if (list_empty(&video->req_free) || buf_done || > + if (list_empty(&video->req_free) || > + buf->state == UVC_BUF_STATE_DONE || > !(video->req_int_count % > DIV_ROUND_UP(video->uvc_num_requests, 4))) { > video->req_int_count = 0; > @@ -510,7 +493,8 @@ int uvcg_video_pump(struct uvc_video *video) > > /* Endpoint now owns the request */ > req = NULL; > - video->req_int_count++; > + if(buf->state != UVC_BUF_STATE_DONE) > + video->req_int_count++; > } > > if (!req) > > > In my case this did not change a lot with the flickering. > > In fact I did see the most effective change when increasing the > fifo size in the dwc3 controller like this in drivers/usb/dwc3/gadget.c > > -813 fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 1); > +813 fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 3); > > My system I am testing against is stressed with a high memory bandwidth > use. So it is possible that in this scenario the hardware fifo will not > get filled fast enough. Thus, changing the fifo size helps here. It is > still just a string to pull on but I think it is worth to dig a bit > deeper here. > > I am not sure if you are already aware of the following discussion: > > https://lore.kernel.org/all/ZPo51EUtBgH+qw44@xxxxxxxxxxxxxx/T/ Thank you for this. I wasn't aware of this thread, I will give it a read! To confirm, should I still put up a patch for removing the video_pump() worker thread or are you planning on doing that ? Thank you, Jayant > > Rergards, > Michael >