On Sun, Oct 15, 2023 at 09:33:43PM -0700, Jayant Chowdhary wrote:
On 10/12/23 11:50, Thinh Nguyen wrote:On Mon, Oct 09, 2023, Jayant Chowdhary wrote:On Fri, Oct 06, 2023 at 03:03:56PM -0700, Jayant Chowdhary wrote:We had been seeing the UVC gadget driver receive isoc errors while sending packets to the usb endpoint - resulting in glitches being shown on linux hosts. My colleague Avichal Rakesh and others had a very enlightening discussion at https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/8741b7cb-54ec-410b-caf5-697f81e8ad64@xxxxxxxxxx/T/__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gQ73n_-Y$ The conclusion that we came to was : usb requests with actual uvc frame data were missing their scheduled uframes in the usb controller. As a mitigation, we started sending 0 length usb requests when there was no uvc frame buffer available to get data from. Even with this mitigation, we are seeing glitches - albeit at a lower frequency. After some investigation, it is seen that we’re getting isoc errors when the worker thread serving video_pump() work items, doesn’t get scheduled for longer periods of time - than usual - in most cases > 6ms. This is close enough to the 8ms limit that we have when the number of usb requests in the queue is 64 (since we have a 125us uframe period). In order to tolerate the scheduling delays better, it helps to increase the number of usb requests in the queue . In that case, we have more 0 length requests given to the udc driver - and as a result we can wait longer for uvc frames with valid data to get processed by video_pump(). I’m attaching a patch which lets one configure the upper bound on the number of usb requests allocated through configfs. Please let me know your thoughts. I can formalize the patch if it looks okay.Why do you want to limit the upper bound? Why not just not submit so many requests from userspace as you control that, right?Userspace negotiates a video frame rate (typically 30/60fps) with the host that does not necessarily correspond to the ISOC cadence. After the patch at https://urldefense.com/v3/__https://lkml.org/lkml/diff/2023/5/8/1115/1__;!!A4F2R9G_pg!e3zVZGt-6Td6HJXqh8GaZAsUeKyvKBhOoyru9qzn3Vkw01Vdkwk7hFr_t5glBG2BYJlOYfFKEUpiH5H4gWbb9bvy$ was submitted, we are maintaining back pressure on the usb controller even if we do not have uvc frame data, by sending the controller 0 length requests (as long as usb_requests are available). Also, even if the userspace application were to somehow produce data to match the ISOC rate, it would need to have information about USB timing details - which I am not sure is available to userspace or is the right thing to do here ? Here, we are trying to handle the scenario in which the video_pump() worker thread does not get scheduled in time - by increasing the number of usb requests allocated in the queue. This would send more usb requests to the usb controller, when video_pump() does get scheduled - even if they're 0 length. This buys the video_pump() worker thread scheduling time -since more usb requests are with the controller, subsequent requests sent will not be 'stale' and dropped by the usb controller.I believe you're testing against dwc3 controller right? I may not be as familiar with UVC function driver, but based on the previous discussions, I think the driver should be able to handle this without the user input.Yes we are testing against the dwc3 controller.The frequency of the request submission should not depend on the video_pump() work thread since it can vary. The frequency of request submission should match with the request completion. We know that request completion rate should be fixed (1 uframe/request + when you don't set no_interrupt). Base on this you can do your calculation on how often you should set no_interrupt and how many requests you must submit. You don't have to wait for the video_pump() to submit 0-length requests. The only variable here is the completion handler delay or system latency, which should not be much and should be within your calculation.Thanks for the suggestion. It indeed makes sense that we do not completely depend on video_pump() for sending 0 length requests. I was concerned about synchronization needed when we send requests to the dwc3 controller from different threads. I see that the dwc3 controller code does internally serialize queueing requests, can we expect this from other controllers as well ? This brings me to another question for Michael - I see that we introduced a worker thread for pumping usb requests to the usb endpoint in https://lore.kernel.org/all/20200427151614.10868-1-m.grzeschik@xxxxxxxxxxxxxx/ (I see multiple email addresses, so apologies if I used the incorrect one). Did we introduce the worker thread to solve some specific deadlock scenarios ?
Exactly. This was the reason why we moved to the pump worker. I actually looked into the host side implementation of the uvc driver. There we also queue an worker from the complete function: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_video.c#n1646 So this sounded reasonable to me. However we faced similar issues like you and introduced different ways to improve the latency issue. One thing we did was improving the latency by adding WQ_HIGHPRI https://lore.kernel.org/linux-usb/20220907215818.2670097-1-m.grzeschik@xxxxxxxxxxxxxx/ Another patch here is also adding WQ_CPU_INTENSIVE. But, after all the input from Thinh it is probably better to solve the issue in a more reliable way.
Or was it a general mitigation against racy usb request submission from v4l2 buffer queuing, stream enable and the video complete handler firing ?
I don't remember all of the issues we saw back then. But this is also an very likely scenario.
I was chatting with Avi about this, what if we submit requests to the endpoint only at two points in the streaming lifecycle - 1) The whole 64 (or however many usb requests are allocated) when uvcg_video_enable() is called - with 0 length usb_requests. 2) In the video complete handler - if a video buffer is available, we encode it and submit it to the endpoint. If not, we send a 0 length request.
It really sounds like a good idea.
This way we're really maintaining back pressure and sending requests as soon as we can to the dwc3 controller. Encoding is mostly memcpys from what I see so hopefully not too heavy on the interrupt handler. I will work on prototyping this meanwhile.
Great! Michael -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Attachment:
signature.asc
Description: PGP signature