Re: uvc gadget: Making upper bound of number of usb requests allocated configurable through configfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 18, 2023 at 03:28:57PM +0200, Michael Grzeschik wrote:
On Sun, Oct 15, 2023 at 09:33:43PM -0700, Jayant Chowdhary wrote:
On 10/12/23 11:50, Thinh Nguyen wrote:
On Mon, Oct 09, 2023, Jayant Chowdhary wrote:
On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote:
We had been seeing the UVC gadget driver receive isoc errors while
sending packets to the usb endpoint - resulting in glitches being shown
on linux hosts. My colleague Avichal Rakesh and others had a very
enlightening discussion at
https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@xxxxxxxxxx/T/__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gQ73n_-Y$


The conclusion that we came to was : usb requests with actual uvc frame
data were missing their scheduled uframes in the usb controller. As a
mitigation, we started sending 0 length usb requests when there was no
uvc frame buffer available to get data from. Even with this mitigation,
we are seeing glitches - albeit at a lower frequency.

After some investigation, it is seen that we’re getting isoc errors when
the worker thread serving video_pump() work items, doesn’t get scheduled
for longer periods of time - than usual - in most cases > 6ms.
This is close enough to the 8ms limit that we have when the number of usb
requests in the queue is 64 (since we have a 125us uframe period). In order
to tolerate the scheduling delays better, it helps to increase the number of
usb requests in the queue . In that case, we have more 0 length requests
given to the udc driver - and as a result we can wait longer for uvc
frames with valid data to get processed by video_pump(). I’m attaching a
patch which lets one configure the upper bound on the number of usb
requests allocated through configfs. Please let me know your thoughts.
I can formalize  the patch if it looks okay.
Why do you want to limit the upper bound?  Why not just not submit so
many requests from userspace as you control that, right?

Userspace negotiates a video frame rate (typically 30/60fps) with the host that does
not necessarily correspond to the ISOC cadence. After the
patch at https://urldefense.com/v3/__https://lkml.org/lkml/diff/2023/5/8/1115/1__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gWbb9bvy$  was submitted, we are
maintaining back pressure on the usb controller even if we do not have uvc frame
data, by sending the controller 0 length requests (as long as usb_requests are
available). Also, even if the userspace application were to somehow produce
data to match the ISOC rate, it would  need to have information about USB
timing details - which I am not sure is available to userspace or is the right
thing to do here ?

Here, we are trying to handle the scenario in which the video_pump() worker
thread does not get scheduled in time - by increasing the number of usb requests
allocated in the queue. This would send more usb requests to the usb controller,
when video_pump() does get scheduled - even if they're 0 length. This buys
the video_pump() worker thread scheduling time -since more usb requests
are with the controller, subsequent requests sent will not be 'stale' and
dropped by the usb controller.

I believe you're testing against dwc3 controller right? I may not be as
familiar with UVC function driver, but based on the previous
discussions, I think the driver should be able to handle this without
the user input.

Yes we are testing against the dwc3 controller.


The frequency of the request submission should not depend on the
video_pump() work thread since it can vary. The frequency of request
submission should match with the request completion. We know that
request completion rate should be fixed (1 uframe/request + when you
don't set no_interrupt). Base on this you can do your calculation on how
often you should set no_interrupt and how many requests you must submit.
You don't have to wait for the video_pump() to submit 0-length requests.

The only variable here is the completion handler delay or system
latency, which should not be much and should be within your calculation.


Thanks for the suggestion. It indeed makes sense that we do not completely depend on
video_pump() for sending 0 length requests. I was concerned about
synchronization needed when we send requests to the dwc3 controller from
different threads. I see that the dwc3 controller code does internally serialize
queueing requests, can we expect this from other controllers as well ?

This brings me to another question for Michael - I see
that we introduced a worker thread for pumping  usb requests to the usb endpoint
in https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@xxxxxxxxxxxxxx/
(I see multiple email addresses, so apologies if I used the incorrect one).

Did we introduce the worker thread to solve some specific deadlock scenarios ?

Exactly. This was the reason why we moved to the pump worker. I actually
looked into the host side implementation of the uvc driver. There we
also queue an worker from the complete function:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_video.c#n1646

So this sounded reasonable to me. However we faced similar issues like
you and introduced different ways to improve the latency issue.

One thing we did was improving the latency by adding WQ_HIGHPRI

https://lore.kernel.org/linux-usb/20220907215818.2670097-1-m.grzeschik@xxxxxxxxxxxxxx/

Another patch here is also adding WQ_CPU_INTENSIVE.

But, after all the input from Thinh it is probably better to solve the
issue in a more reliable way.

Or was it a general mitigation against racy usb request submission from v4l2 buffer
queuing, stream enable and the video complete handler firing ?

I don't remember all of the issues we saw back then. But this is also an very
likely scenario.

I was chatting with Avi about this, what if we submit requests to the endpoint
only at two points in the streaming lifecycle -
1) The whole 64 (or however many usb requests are allocated) when
 uvcg_video_enable() is called - with 0 length usb_requests.
2) In the video complete handler - if a video buffer is available, we encode it
 and submit it to the endpoint. If not, we send a 0 length request.

It really sounds like a good idea.

This way we're really maintaining back pressure and sending requests as soon
as we can to the dwc3 controller. Encoding is mostly memcpys from what I see
so hopefully not too heavy on the interrupt handler. I will work on prototyping
this meanwhile.

[1] https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@xxxxxxxxxx/T/#t

It was actually not that hard to do that.
With the patches from this thread applied [1] , the unformal changes looks like this:

#change 1

diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
index f64d03136c5665..29cd23c38eb99d 100644
--- a/drivers/usb/gadget/function/uvc_v4l2.c
+++ b/drivers/usb/gadget/function/uvc_v4l2.c
@@ -626,8 +626,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
 	if (ret < 0)
 		return ret;
- if (uvc->state == UVC_STATE_STREAMING)
-		queue_work(video->async_wq, &video->pump);
+	uvcg_video_pump(video);
return ret;
 }
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 2ec51ed5e9d074..2fe800500c88a3 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -329,7 +329,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
 	 */
 	if (video->is_enabled) {
 		list_add_tail(&req->list, &video->req_free);
-		queue_work(video->async_wq, &video->pump);
+		spin_unlock_irqrestore(&video->req_lock, flags);
+		uvcg_video_pump(video);
+		return;
 	} else {
 		uvc_video_free_request(ureq, ep);
 	}
@@ -413,9 +415,8 @@ uvc_video_alloc_requests(struct uvc_video *video)
  * This function fills the available USB requests (listed in req_free) with
  * video data from the queued buffers.
  */
-static void uvcg_video_pump(struct work_struct *work)
+int uvcg_video_pump(struct uvc_video *video)
 {
-	struct uvc_video *video = container_of(work, struct uvc_video, pump);
 	struct uvc_video_queue *queue = &video->queue;
 	/* video->max_payload_size is only set when using bulk transfer */
 	bool is_bulk = video->max_payload_size;
@@ -427,7 +428,7 @@ static void uvcg_video_pump(struct work_struct *work)
while(true) {
 		if (!video->ep->enabled)
-			return;
+			return 0;
/*
 		 * Check is_enabled and retrieve the first available USB
@@ -436,7 +437,7 @@ static void uvcg_video_pump(struct work_struct *work)
 		spin_lock_irqsave(&video->req_lock, flags);
 		if (!video->is_enabled || list_empty(&video->req_free)) {
 			spin_unlock_irqrestore(&video->req_lock, flags);
-			return;
+			return -EBUSY;
 		}
 		req = list_first_entry(&video->req_free, struct usb_request,
 					list);
@@ -513,7 +514,7 @@ static void uvcg_video_pump(struct work_struct *work)
 	}
if (!req)
-		return;
+		return 0;
spin_lock_irqsave(&video->req_lock, flags);
 	if (video->is_enabled)
@@ -521,6 +522,8 @@ static void uvcg_video_pump(struct work_struct *work)
 	else
 		uvc_video_free_request(req->context, video->ep);
 	spin_unlock_irqrestore(&video->req_lock, flags);
+
+	return 0;
 }
/*
@@ -554,7 +557,6 @@ uvcg_video_disable(struct uvc_video *video)
 	}
 	spin_unlock_irqrestore(&video->req_lock, flags);
- cancel_work_sync(&video->pump);
 	uvcg_queue_cancel(&video->queue, 0);
spin_lock_irqsave(&video->req_lock, flags);
@@ -635,8 +637,6 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
video->req_int_count = 0; - queue_work(video->async_wq, &video->pump);
-
 	return ret;
 }
@@ -649,12 +649,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
 	INIT_LIST_HEAD(&video->ureqs);
 	INIT_LIST_HEAD(&video->req_free);
 	spin_lock_init(&video->req_lock);
-	INIT_WORK(&video->pump, uvcg_video_pump);
-
-	/* Allocate a work queue for asynchronous video pump handler. */
-	video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI | WQ_CPU_INTENSIVE, 0);
-	if (!video->async_wq)
-		return -EINVAL;
video->uvc = uvc; diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
index 03adeefa343b71..322b05da43965f 100644
--- a/drivers/usb/gadget/function/uvc_video.h
+++ b/drivers/usb/gadget/function/uvc_video.h
@@ -16,6 +16,8 @@ struct uvc_video;
int uvcg_video_enable(struct uvc_video *video, int enable); +int uvcg_video_pump(struct uvc_video *video);
+
 int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
#endif /* __UVC_VIDEO_H__ */


#change 2

Also if you would like to revert the zero request generation apply this ontop.

diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 82695a2ff39aa3..2a3c87079c548d 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -418,12 +418,9 @@ uvc_video_alloc_requests(struct uvc_video *video)
 int uvcg_video_pump(struct uvc_video *video)
 {
 	struct uvc_video_queue *queue = &video->queue;
-	/* video->max_payload_size is only set when using bulk transfer */
-	bool is_bulk = video->max_payload_size;
 	struct usb_request *req = NULL;
 	struct uvc_buffer *buf;
 	unsigned long flags;
-	bool buf_done;
 	int ret;
while(true) {
@@ -450,28 +447,13 @@ int uvcg_video_pump(struct uvc_video *video)
 		 */
 		spin_lock_irqsave(&queue->irqlock, flags);
 		buf = uvcg_queue_head(queue);
-
-		if (buf != NULL) {
-			video->encode(req, video, buf);
-			buf_done = buf->state == UVC_BUF_STATE_DONE;
-		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) {
-			/*
-			 * No video buffer available; the queue is still connected and
-			 * we're transferring over ISOC. Queue a 0 length request to
-			 * prevent missed ISOC transfers.
-			 */
-			req->length = 0;
-			buf_done = false;
-		} else {
-			/*
-			 * Either the queue has been disconnected or no video buffer
-			 * available for bulk transfer. Either way, stop processing
-			 * further.
-			 */
+		if (buf == NULL) {
 			spin_unlock_irqrestore(&queue->irqlock, flags);
 			break;
 		}
+ video->encode(req, video, buf);
+
 		/*
 		 * With USB3 handling more requests at a higher speed, we can't
 		 * afford to generate an interrupt for every request. Decide to
@@ -490,7 +472,8 @@ int uvcg_video_pump(struct uvc_video *video)
 		 *   indicated by video->uvc_num_requests), as a trade-off
 		 *   between latency and interrupt load.
 		 */
-		if (list_empty(&video->req_free) || buf_done ||
+		if (list_empty(&video->req_free) ||
+		    buf->state == UVC_BUF_STATE_DONE ||
 		    !(video->req_int_count %
 		       DIV_ROUND_UP(video->uvc_num_requests, 4))) {
 			video->req_int_count = 0;
@@ -510,7 +493,8 @@ int uvcg_video_pump(struct uvc_video *video)
/* Endpoint now owns the request */
 		req = NULL;
-		video->req_int_count++;
+		if(buf->state != UVC_BUF_STATE_DONE)
+			video->req_int_count++;
 	}
if (!req)


In my case this did not change a lot with the flickering.

In fact I did see the most effective change when increasing the
fifo size in the dwc3 controller like this in drivers/usb/dwc3/gadget.c

-813         fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 1);
+813         fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 3);

My system I am testing against is stressed with a high memory bandwidth
use. So it is possible that in this scenario the hardware fifo will not
get filled fast enough. Thus, changing the fifo size helps here. It is
still just a string to pull on but I think it is worth to dig a bit
deeper here.

I am not sure if you are already aware of the following discussion:

https://lore.kernel.org/all/ZPo51EUtBgH+qw44@xxxxxxxxxxxxxx/T/

Rergards,
Michael

--
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux