Re: [PATCH v2] usb: dwc3: gadget: check drained isoc ep

Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> · Fri, 17 May 2024 01:29:42 +0000

On Sun, May 12, 2024, Michael Grzeschik wrote:
> On Sat, May 11, 2024 at 12:51:57AM +0000, Thinh Nguyen wrote:
> > On Thu, May 09, 2024, Michael Grzeschik wrote:
> > > On Wed, May 08, 2024 at 11:03:00PM +0000, Thinh Nguyen wrote:
> > > > On Sun, May 05, 2024, Michael Grzeschik wrote:
> > > > > On Wed, Apr 24, 2024 at 01:51:01AM +0000, Thinh Nguyen wrote:
> > > > > > >
> > > > > >
> > > > > > Right. Unfortunately, dwc3 can only "guess" when UVC function stops
> > > > > > pumping more request or whether it's due to some large latency. The
> > > > > > logic to workaround this underrun issue will not be foolproof. Perhaps
> > > > > > we can improve upon it, but the solution is better implement at the UVC
> > > > > > function driver.
> > > > >
> > > > > Yes, the best way to solve this is in the uvc driver.
> > > > >
> > > > > > I thought we have the mechanism in UVC function now to ensure queuing
> > > > > > enough zero-length requests to account for underrun/latency issue?
> > > > > > What's the issue now?
> > > > >
> > > > > This is actually only partially true. Even with the zero-length packages
> > > > > it is possible that we run into underruns. This is why we implemented
> > > > > this patch. This has happened because another interrupt thread with the
> > > > > same prio on the same CPU as this interrupt thread was keeping the CPU
> > > >
> > > > Do you have the data on the worst latency?
> > > 
> > > It was something a bit more then around 2ms AFAIR. Since with one frame
> > > enqueued we only trigger the interrupt every 16 requests (16*125us = 2ms)
> > > 
> > > So with at least 2ms latency we did hit the sweet spot in several cases.
> > 
> > For 2ms, we should be able to handle this with the zero-length requests.
> 
> When the interrupt thread is the one that is enqueuing also the
> zero-requests (like the uvc_video gadget) is doing now, we won't be able
> to do that.

How long does enqueuing take? Does it take longer than the number of
intervals that it enqueues?

> 
> > > 
> > > > Can this other interrupt thread lower its priority relative to UVC? For
> > > > isoc endpoint, data is time critical.
> > > 
> > > The details are not that important. Sure the is a bug, that needed to be
> > > solved. But all I wanted is to improve the overal dwc3 driver.
> > > 
> > > > Currently dwc3 can have up to 255 TRBs per endpoint, potentially 255
> > > > zero-length requests. That's 255 uframe, or roughly ~30ms. Is your worst
> > > > latency more than 30ms? ie. no handling of transfer completion and
> > > > ep_queue for the whole 255 intervals or 30ms. If that's the case, we
> > > > have problems that cannot just be solved in dwc3.
> > > 
> > > Yes. But as mentioned above, this was not the case. Speaking of, there
> > > is currently a bug in the uvc_video driver, that is not taking into
> > > acount that actually every zero-length request should without exception
> > > need to trigger an interrupt.
> > 
> > Not necessarily, you can send multiple set of zero-length requests
> > where, for example, IOC on the last request of the set.
> 
> Right. But for this we have to know if the last request that will be
> enqueued will be followed by an actual data request. As the uvc_video
> gadget is implemented now, we can not do this.
> 
> It is only checking if the prepared list is empty and then it is
> enqueueing zero or data requests from the complete handler depending
> from the outcome. It does not know if on the next enqueue the prepared
> list will have some requests ready.

Can we check if the prepare list should always have X amount of requests
instead of empty? If not, fill that up to the X amount with zero-length
requests.

BR,
Thinh

> 
> > > Currently we also only scatter them over
> > > the 16ms period, like with the actual payload. But since we feed the
> > > video stream with the interrupts, we loose 2ms of potential ep_queue
> > > calls with actual payload in the worst case.
> > > 
> > > My patch is already in the stack and will be send today.
> > > 
> > > > > busy. As the dwc3 interrupt thread get to its call, the time was already
> > > > > over and the hw was already drained, although the started list was not
> > > > > yet empty, which was causing the next queued requests to be queued to
> > > > > late. (zero length or not)
> > > > >
> > > > > Yes, this needed to be solved on the upper level first, by moving the
> > > > > long running work of the other interrupt thread to another thread or
> > > > > even into the userspace.
> > > >
> > > > Right.
> > > >
> > > > >
> > > > > However I thought it would be great if we could somehow find out in
> > > > > the dwc3 core and make the pump mechanism more robust against such
> > > > > late enqueues.
> > > >
> > > > The dwc3 core handling of events and ep_queue is relatively quick. I'm
> > > > all for any optimization in the dwc3 core for performance. However, I
> > > > don't want to just continue looking for workaround in the dwc3 core
> > > > without looking to solve the issue where it should be. I don't want to
> > > > sacrifice complexity and/or performance to other applications for just
> > > > UVC.
> > > 
> > > I totally understand this. And as we already found out more and more
> > > about the underlying complexity of the dwc3 driver I see more and more
> > > clearer how we have to handle the corner cases. I just started this
> > > conversation with Avichal and you in the other thread.
> > > 
> > > https://lore.kernel.org/all/17192e0f-7f18-49ae-96fc-71054d46f74a@xxxxxxxxxx/
> > > 
> > > I think there is some work to come. As to be sure that everybody is on
> > > the same page I will prepare a roadmap on how to proceed and what to
> > > discuss. There are many cases interfering with each other which make the
> > > solution pretty complex.
> > 
> > That's great. Let's do that so we can resolve this issue.
> 
> Good
> 
> > > > > This all started with that series.
> > > > >
> > > > > https://lore.kernel.org/all/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@xxxxxxxxxxxxxx/
> > > > >
> > > > > And patch 2 of this series did work well so far. The next move was this
> > > > > patch.
> > > > >
> > > > > Since the last week debugging we found out that it got other issues.
> > > > > It is not allways save to read the HWO bit, from the driver.
> > > > >
> > > > > Turns out that after an new TRB was prepared with the HWO bit set
> > > > > it is not save to read immideatly back from that value as the hw
> > > > > will be doing some operations on that exactly new prepared TRB.
> > > > >
> > > > > We ran into this problem when applying this patch. The trb buffer list
> > > > > was actually filled but we hit a false positive where the latest HWO bit
> > > > > was 0 (probably due to the hw action in the background) and therefor
> > > > > went into end transfer.
> > > > >
> > > >
> > > > Thanks for the update.
> > > 
> > 
> 
> -- 
> Pengutronix e.K.                           |                             |
> Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |