On Thu, May 09, 2024, Michael Grzeschik wrote: > On Wed, May 08, 2024 at 11:03:00PM +0000, Thinh Nguyen wrote: > > On Sun, May 05, 2024, Michael Grzeschik wrote: > > > On Wed, Apr 24, 2024 at 01:51:01AM +0000, Thinh Nguyen wrote: > > > > > > > > > > > > > Right. Unfortunately, dwc3 can only "guess" when UVC function stops > > > > pumping more request or whether it's due to some large latency. The > > > > logic to workaround this underrun issue will not be foolproof. Perhaps > > > > we can improve upon it, but the solution is better implement at the UVC > > > > function driver. > > > > > > Yes, the best way to solve this is in the uvc driver. > > > > > > > I thought we have the mechanism in UVC function now to ensure queuing > > > > enough zero-length requests to account for underrun/latency issue? > > > > What's the issue now? > > > > > > This is actually only partially true. Even with the zero-length packages > > > it is possible that we run into underruns. This is why we implemented > > > this patch. This has happened because another interrupt thread with the > > > same prio on the same CPU as this interrupt thread was keeping the CPU > > > > Do you have the data on the worst latency? > > It was something a bit more then around 2ms AFAIR. Since with one frame > enqueued we only trigger the interrupt every 16 requests (16*125us = 2ms) > > So with at least 2ms latency we did hit the sweet spot in several cases. For 2ms, we should be able to handle this with the zero-length requests. > > > Can this other interrupt thread lower its priority relative to UVC? For > > isoc endpoint, data is time critical. > > The details are not that important. Sure the is a bug, that needed to be > solved. But all I wanted is to improve the overal dwc3 driver. > > > Currently dwc3 can have up to 255 TRBs per endpoint, potentially 255 > > zero-length requests. That's 255 uframe, or roughly ~30ms. Is your worst > > latency more than 30ms? ie. no handling of transfer completion and > > ep_queue for the whole 255 intervals or 30ms. If that's the case, we > > have problems that cannot just be solved in dwc3. > > Yes. But as mentioned above, this was not the case. Speaking of, there > is currently a bug in the uvc_video driver, that is not taking into > acount that actually every zero-length request should without exception > need to trigger an interrupt. Not necessarily, you can send multiple set of zero-length requests where, for example, IOC on the last request of the set. > Currently we also only scatter them over > the 16ms period, like with the actual payload. But since we feed the > video stream with the interrupts, we loose 2ms of potential ep_queue > calls with actual payload in the worst case. > > My patch is already in the stack and will be send today. > > > > busy. As the dwc3 interrupt thread get to its call, the time was already > > > over and the hw was already drained, although the started list was not > > > yet empty, which was causing the next queued requests to be queued to > > > late. (zero length or not) > > > > > > Yes, this needed to be solved on the upper level first, by moving the > > > long running work of the other interrupt thread to another thread or > > > even into the userspace. > > > > Right. > > > > > > > > However I thought it would be great if we could somehow find out in > > > the dwc3 core and make the pump mechanism more robust against such > > > late enqueues. > > > > The dwc3 core handling of events and ep_queue is relatively quick. I'm > > all for any optimization in the dwc3 core for performance. However, I > > don't want to just continue looking for workaround in the dwc3 core > > without looking to solve the issue where it should be. I don't want to > > sacrifice complexity and/or performance to other applications for just > > UVC. > > I totally understand this. And as we already found out more and more > about the underlying complexity of the dwc3 driver I see more and more > clearer how we have to handle the corner cases. I just started this > conversation with Avichal and you in the other thread. > > https://lore.kernel.org/all/17192e0f-7f18-49ae-96fc-71054d46f74a@xxxxxxxxxx/ > > I think there is some work to come. As to be sure that everybody is on > the same page I will prepare a roadmap on how to proceed and what to > discuss. There are many cases interfering with each other which make the > solution pretty complex. That's great. Let's do that so we can resolve this issue. > > > > This all started with that series. > > > > > > https://lore.kernel.org/all/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@xxxxxxxxxxxxxx/ > > > > > > And patch 2 of this series did work well so far. The next move was this > > > patch. > > > > > > Since the last week debugging we found out that it got other issues. > > > It is not allways save to read the HWO bit, from the driver. > > > > > > Turns out that after an new TRB was prepared with the HWO bit set > > > it is not save to read immideatly back from that value as the hw > > > will be doing some operations on that exactly new prepared TRB. > > > > > > We ran into this problem when applying this patch. The trb buffer list > > > was actually filled but we hit a false positive where the latest HWO bit > > > was 0 (probably due to the hw action in the background) and therefor > > > went into end transfer. > > > > > > > Thanks for the update. > Thanks, Thinh