Re: uvc gadget: Making upper bound of number of usb requests allocated configurable through configfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Michael,

On 10/19/23 16:15, Michael Grzeschik wrote:
> On Wed, Oct 18, 2023 at 03:28:57PM +0200, Michael Grzeschik wrote:
>> On Sun, Oct 15, 2023 at 09:33:43PM -0700, Jayant Chowdhary wrote:
>>> On 10/12/23 11:50, Thinh Nguyen wrote:
>>>> On Mon, Oct 09, 2023, Jayant Chowdhary wrote:
>>>>>> On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote:
>>>>>>> We had been seeing the UVC gadget driver receive isoc errors while
>>>>>>> sending packets to the usb endpoint - resulting in glitches being shown
>>>>>>> on linux hosts. My colleague Avichal Rakesh and others had a very
>>>>>>> enlightening discussion at
>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@xxxxxxxxxx/T/__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gQ73n_-Y$
>>>>>>>
>>>>>>>
>>>>>>> The conclusion that we came to was : usb requests with actual uvc frame
>>>>>>> data were missing their scheduled uframes in the usb controller. As a
>>>>>>> mitigation, we started sending 0 length usb requests when there was no
>>>>>>> uvc frame buffer available to get data from. Even with this mitigation,
>>>>>>> we are seeing glitches - albeit at a lower frequency.
>>>>>>>
>>>>>>> After some investigation, it is seen that we’re getting isoc errors when
>>>>>>> the worker thread serving video_pump() work items, doesn’t get scheduled
>>>>>>> for longer periods of time - than usual - in most cases > 6ms.
>>>>>>> This is close enough to the 8ms limit that we have when the number of usb
>>>>>>> requests in the queue is 64 (since we have a 125us uframe period). In order
>>>>>>> to tolerate the scheduling delays better, it helps to increase the number of
>>>>>>> usb requests in the queue . In that case, we have more 0 length requests
>>>>>>> given to the udc driver - and as a result we can wait longer for uvc
>>>>>>> frames with valid data to get processed by video_pump(). I’m attaching a
>>>>>>> patch which lets one configure the upper bound on the number of usb
>>>>>>> requests allocated through configfs. Please let me know your thoughts.
>>>>>>> I can formalize  the patch if it looks okay.
>>>>>> Why do you want to limit the upper bound?  Why not just not submit so
>>>>>> many requests from userspace as you control that, right?
>>>>>
>>>>> Userspace negotiates a video frame rate (typically 30/60fps) with the host that does
>>>>> not necessarily correspond to the ISOC cadence. After the
>>>>> patch at https://urldefense.com/v3/__https://lkml.org/lkml/diff/2023/5/8/1115/1__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gWbb9bvy$ ; was submitted, we are
>>>>> maintaining back pressure on the usb controller even if we do not have uvc frame
>>>>> data, by sending the controller 0 length requests (as long as usb_requests are
>>>>> available). Also, even if the userspace application were to somehow produce
>>>>> data to match the ISOC rate, it would  need to have information about USB
>>>>> timing details - which I am not sure is available to userspace or is the right
>>>>> thing to do here ?
>>>>>
>>>>> Here, we are trying to handle the scenario in which the video_pump() worker
>>>>> thread does not get scheduled in time - by increasing the number of usb requests
>>>>> allocated in the queue. This would send more usb requests to the usb controller,
>>>>> when video_pump() does get scheduled - even if they're 0 length. This buys
>>>>> the video_pump() worker thread scheduling time -since more usb requests
>>>>> are with the controller, subsequent requests sent will not be 'stale' and
>>>>> dropped by the usb controller.
>>>>>
>>>> I believe you're testing against dwc3 controller right? I may not be as
>>>> familiar with UVC function driver, but based on the previous
>>>> discussions, I think the driver should be able to handle this without
>>>> the user input.
>>>
>>> Yes we are testing against the dwc3 controller.
>>>
>>>>
>>>> The frequency of the request submission should not depend on the
>>>> video_pump() work thread since it can vary. The frequency of request
>>>> submission should match with the request completion. We know that
>>>> request completion rate should be fixed (1 uframe/request + when you
>>>> don't set no_interrupt). Base on this you can do your calculation on how
>>>> often you should set no_interrupt and how many requests you must submit.
>>>> You don't have to wait for the video_pump() to submit 0-length requests.
>>>>
>>>> The only variable here is the completion handler delay or system
>>>> latency, which should not be much and should be within your calculation.
>>>
>>>
>>> Thanks for the suggestion. It indeed makes sense that we do not completely depend on
>>> video_pump() for sending 0 length requests. I was concerned about
>>> synchronization needed when we send requests to the dwc3 controller from
>>> different threads. I see that the dwc3 controller code does internally serialize
>>> queueing requests, can we expect this from other controllers as well ?
>>>
>>> This brings me to another question for Michael - I see
>>> that we introduced a worker thread for pumping  usb requests to the usb endpoint
>>> in https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@xxxxxxxxxxxxxx/
>>> (I see multiple email addresses, so apologies if I used the incorrect one).
>>>
>>> Did we introduce the worker thread to solve some specific deadlock scenarios ?
>>
>> Exactly. This was the reason why we moved to the pump worker. I actually
>> looked into the host side implementation of the uvc driver. There we
>> also queue an worker from the complete function:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_video.c#n1646
>>
>> So this sounded reasonable to me. However we faced similar issues like
>> you and introduced different ways to improve the latency issue.
>>
>> One thing we did was improving the latency by adding WQ_HIGHPRI
>>
>> https://lore.kernel.org/linux-usb/20220907215818.2670097-1-m.grzeschik@xxxxxxxxxxxxxx/
>>
>> Another patch here is also adding WQ_CPU_INTENSIVE.
>>
>> But, after all the input from Thinh it is probably better to solve the
>> issue in a more reliable way.
>>
>>> Or was it a general mitigation against racy usb request submission from v4l2 buffer
>>> queuing, stream enable and the video complete handler firing ?
>>
>> I don't remember all of the issues we saw back then. But this is also an very
>> likely scenario.
>>
>>> I was chatting with Avi about this, what if we submit requests to the endpoint
>>> only at two points in the streaming lifecycle -
>>> 1) The whole 64 (or however many usb requests are allocated) when
>>>  uvcg_video_enable() is called - with 0 length usb_requests.
>>> 2) In the video complete handler - if a video buffer is available, we encode it
>>>  and submit it to the endpoint. If not, we send a 0 length request.
>>
>> It really sounds like a good idea.
>>
>>> This way we're really maintaining back pressure and sending requests as soon
>>> as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
>>> so hopefully not too heavy on the interrupt handler. I will work on prototyping
>>> this meanwhile.
>
> [1] https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@xxxxxxxxxx/T/#t
>
> It was actually not that hard to do that.
> With the patches from this thread applied [1] , the unformal changes looks like this:
>
> #change 1
>
> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
> index f64d03136c5665..29cd23c38eb99d 100644
> --- a/drivers/usb/gadget/function/uvc_v4l2.c
> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
> @@ -626,8 +626,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>      if (ret < 0)
>          return ret;
>  
> -    if (uvc->state == UVC_STATE_STREAMING)
> -        queue_work(video->async_wq, &video->pump);
> +    uvcg_video_pump(video);
>  
>      return ret;
>  }
> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
> index 2ec51ed5e9d074..2fe800500c88a3 100644
> --- a/drivers/usb/gadget/function/uvc_video.c
> +++ b/drivers/usb/gadget/function/uvc_video.c
> @@ -329,7 +329,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
>       */
>      if (video->is_enabled) {
>          list_add_tail(&req->list, &video->req_free);
> -        queue_work(video->async_wq, &video->pump);
> +        spin_unlock_irqrestore(&video->req_lock, flags);
> +        uvcg_video_pump(video);
> +        return;
>      } else {
>          uvc_video_free_request(ureq, ep);
>      }
> @@ -413,9 +415,8 @@ uvc_video_alloc_requests(struct uvc_video *video)
>   * This function fills the available USB requests (listed in req_free) with
>   * video data from the queued buffers.
>   */
> -static void uvcg_video_pump(struct work_struct *work)
> +int uvcg_video_pump(struct uvc_video *video)
>  {
> -    struct uvc_video *video = container_of(work, struct uvc_video, pump);
>      struct uvc_video_queue *queue = &video->queue;
>      /* video->max_payload_size is only set when using bulk transfer */
>      bool is_bulk = video->max_payload_size;
> @@ -427,7 +428,7 @@ static void uvcg_video_pump(struct work_struct *work)
>  
>      while(true) {
>          if (!video->ep->enabled)
> -            return;
> +            return 0;
>  
>          /*
>           * Check is_enabled and retrieve the first available USB
> @@ -436,7 +437,7 @@ static void uvcg_video_pump(struct work_struct *work)
>          spin_lock_irqsave(&video->req_lock, flags);
>          if (!video->is_enabled || list_empty(&video->req_free)) {
>              spin_unlock_irqrestore(&video->req_lock, flags);
> -            return;
> +            return -EBUSY;
>          }
>          req = list_first_entry(&video->req_free, struct usb_request,
>                      list);
> @@ -513,7 +514,7 @@ static void uvcg_video_pump(struct work_struct *work)
>      }
>  
>      if (!req)
> -        return;
> +        return 0;
>  
>      spin_lock_irqsave(&video->req_lock, flags);
>      if (video->is_enabled)
> @@ -521,6 +522,8 @@ static void uvcg_video_pump(struct work_struct *work)
>      else
>          uvc_video_free_request(req->context, video->ep);
>      spin_unlock_irqrestore(&video->req_lock, flags);
> +
> +    return 0;
>  }
>  
>  /*
> @@ -554,7 +557,6 @@ uvcg_video_disable(struct uvc_video *video)
>      }
>      spin_unlock_irqrestore(&video->req_lock, flags);
>  
> -    cancel_work_sync(&video->pump);
>      uvcg_queue_cancel(&video->queue, 0);
>  
>      spin_lock_irqsave(&video->req_lock, flags);
> @@ -635,8 +637,6 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
>  
>      video->req_int_count = 0;
>  
> -    queue_work(video->async_wq, &video->pump);
> -
>      return ret;
>  }
>  
> @@ -649,12 +649,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
>      INIT_LIST_HEAD(&video->ureqs);
>      INIT_LIST_HEAD(&video->req_free);
>      spin_lock_init(&video->req_lock);
> -    INIT_WORK(&video->pump, uvcg_video_pump);
> -
> -    /* Allocate a work queue for asynchronous video pump handler. */
> -    video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI | WQ_CPU_INTENSIVE, 0);
> -    if (!video->async_wq)
> -        return -EINVAL;
>  
>      video->uvc = uvc;
>  
> diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
> index 03adeefa343b71..322b05da43965f 100644
> --- a/drivers/usb/gadget/function/uvc_video.h
> +++ b/drivers/usb/gadget/function/uvc_video.h
> @@ -16,6 +16,8 @@ struct uvc_video;
>  
>  int uvcg_video_enable(struct uvc_video *video, int enable);
>  
> +int uvcg_video_pump(struct uvc_video *video);
> +
>  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
>  
>  #endif /* __UVC_VIDEO_H__ */
>
>

Thank you for this. I made some slight modifications (nothing functional)
and applied this. I'm actually seeing that the flickers completely disappear
on the device that I'm testing.

>From around a flicker every couple of minutes to none in 20 minutes. What I did
keep was the 0 length request submission, since the camera is naturally producing
data at a much lower rate than what the usb controller expects. Is there a reason we would
want to remove that code ? 

> #change 2
>
> Also if you would like to revert the zero request generation apply this ontop.
>
> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
> index 82695a2ff39aa3..2a3c87079c548d 100644
> --- a/drivers/usb/gadget/function/uvc_video.c
> +++ b/drivers/usb/gadget/function/uvc_video.c
> @@ -418,12 +418,9 @@ uvc_video_alloc_requests(struct uvc_video *video)
>  int uvcg_video_pump(struct uvc_video *video)
>  {
>      struct uvc_video_queue *queue = &video->queue;
> -    /* video->max_payload_size is only set when using bulk transfer */
> -    bool is_bulk = video->max_payload_size;
>      struct usb_request *req = NULL;
>      struct uvc_buffer *buf;
>      unsigned long flags;
> -    bool buf_done;
>      int ret;
>  
>      while(true) {
> @@ -450,28 +447,13 @@ int uvcg_video_pump(struct uvc_video *video)
>           */
>          spin_lock_irqsave(&queue->irqlock, flags);
>          buf = uvcg_queue_head(queue);
> -
> -        if (buf != NULL) {
> -            video->encode(req, video, buf);
> -            buf_done = buf->state == UVC_BUF_STATE_DONE;
> -        } else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) {
> -            /*
> -             * No video buffer available; the queue is still connected and
> -             * we're transferring over ISOC. Queue a 0 length request to
> -             * prevent missed ISOC transfers.
> -             */
> -            req->length = 0;
> -            buf_done = false;
> -        } else {
> -            /*
> -             * Either the queue has been disconnected or no video buffer
> -             * available for bulk transfer. Either way, stop processing
> -             * further.
> -             */
> +        if (buf == NULL) {
>              spin_unlock_irqrestore(&queue->irqlock, flags);
>              break;
>          }
>  
> +        video->encode(req, video, buf);
> +
>          /*
>           * With USB3 handling more requests at a higher speed, we can't
>           * afford to generate an interrupt for every request. Decide to
> @@ -490,7 +472,8 @@ int uvcg_video_pump(struct uvc_video *video)
>           *   indicated by video->uvc_num_requests), as a trade-off
>           *   between latency and interrupt load.
>           */
> -        if (list_empty(&video->req_free) || buf_done ||
> +        if (list_empty(&video->req_free) ||
> +            buf->state == UVC_BUF_STATE_DONE ||
>              !(video->req_int_count %
>                 DIV_ROUND_UP(video->uvc_num_requests, 4))) {
>              video->req_int_count = 0;
> @@ -510,7 +493,8 @@ int uvcg_video_pump(struct uvc_video *video)
>  
>          /* Endpoint now owns the request */
>          req = NULL;
> -        video->req_int_count++;
> +        if(buf->state != UVC_BUF_STATE_DONE)
> +            video->req_int_count++;
>      }
>  
>      if (!req)
>
>
> In my case this did not change a lot with the flickering.
>
> In fact I did see the most effective change when increasing the
> fifo size in the dwc3 controller like this in drivers/usb/dwc3/gadget.c
>
> -813         fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 1);
> +813         fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 3);
>
> My system I am testing against is stressed with a high memory bandwidth
> use. So it is possible that in this scenario the hardware fifo will not
> get filled fast enough. Thus, changing the fifo size helps here. It is
> still just a string to pull on but I think it is worth to dig a bit
> deeper here.
>
> I am not sure if you are already aware of the following discussion:
>
> https://lore.kernel.org/all/ZPo51EUtBgH+qw44@xxxxxxxxxxxxxx/T/

Thank you for this. I wasn't aware of this thread, I will give it a read!
To confirm, should I still put up a patch for removing the video_pump() worker
thread or are you planning on doing that ?

Thank you,
Jayant

>
> Rergards,
> Michael
>




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux