Re: [PATCH v2] usb: dwc3: gadget: check drained isoc ep

Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> · Sat, 11 May 2024 00:51:57 +0000

On Thu, May 09, 2024, Michael Grzeschik wrote:
> On Wed, May 08, 2024 at 11:03:00PM +0000, Thinh Nguyen wrote:
> > On Sun, May 05, 2024, Michael Grzeschik wrote:
> > > On Wed, Apr 24, 2024 at 01:51:01AM +0000, Thinh Nguyen wrote:
> > > > >
> > > >
> > > > Right. Unfortunately, dwc3 can only "guess" when UVC function stops
> > > > pumping more request or whether it's due to some large latency. The
> > > > logic to workaround this underrun issue will not be foolproof. Perhaps
> > > > we can improve upon it, but the solution is better implement at the UVC
> > > > function driver.
> > > 
> > > Yes, the best way to solve this is in the uvc driver.
> > > 
> > > > I thought we have the mechanism in UVC function now to ensure queuing
> > > > enough zero-length requests to account for underrun/latency issue?
> > > > What's the issue now?
> > > 
> > > This is actually only partially true. Even with the zero-length packages
> > > it is possible that we run into underruns. This is why we implemented
> > > this patch. This has happened because another interrupt thread with the
> > > same prio on the same CPU as this interrupt thread was keeping the CPU
> > 
> > Do you have the data on the worst latency?
> 
> It was something a bit more then around 2ms AFAIR. Since with one frame
> enqueued we only trigger the interrupt every 16 requests (16*125us = 2ms)
> 
> So with at least 2ms latency we did hit the sweet spot in several cases.

For 2ms, we should be able to handle this with the zero-length requests.

> 
> > Can this other interrupt thread lower its priority relative to UVC? For
> > isoc endpoint, data is time critical.
> 
> The details are not that important. Sure the is a bug, that needed to be
> solved. But all I wanted is to improve the overal dwc3 driver.
> 
> > Currently dwc3 can have up to 255 TRBs per endpoint, potentially 255
> > zero-length requests. That's 255 uframe, or roughly ~30ms. Is your worst
> > latency more than 30ms? ie. no handling of transfer completion and
> > ep_queue for the whole 255 intervals or 30ms. If that's the case, we
> > have problems that cannot just be solved in dwc3.
> 
> Yes. But as mentioned above, this was not the case. Speaking of, there
> is currently a bug in the uvc_video driver, that is not taking into
> acount that actually every zero-length request should without exception
> need to trigger an interrupt.

Not necessarily, you can send multiple set of zero-length requests
where, for example, IOC on the last request of the set.

> Currently we also only scatter them over
> the 16ms period, like with the actual payload. But since we feed the
> video stream with the interrupts, we loose 2ms of potential ep_queue
> calls with actual payload in the worst case.
> 
> My patch is already in the stack and will be send today.
> 
> > > busy. As the dwc3 interrupt thread get to its call, the time was already
> > > over and the hw was already drained, although the started list was not
> > > yet empty, which was causing the next queued requests to be queued to
> > > late. (zero length or not)
> > > 
> > > Yes, this needed to be solved on the upper level first, by moving the
> > > long running work of the other interrupt thread to another thread or
> > > even into the userspace.
> > 
> > Right.
> > 
> > > 
> > > However I thought it would be great if we could somehow find out in
> > > the dwc3 core and make the pump mechanism more robust against such
> > > late enqueues.
> > 
> > The dwc3 core handling of events and ep_queue is relatively quick. I'm
> > all for any optimization in the dwc3 core for performance. However, I
> > don't want to just continue looking for workaround in the dwc3 core
> > without looking to solve the issue where it should be. I don't want to
> > sacrifice complexity and/or performance to other applications for just
> > UVC.
> 
> I totally understand this. And as we already found out more and more
> about the underlying complexity of the dwc3 driver I see more and more
> clearer how we have to handle the corner cases. I just started this
> conversation with Avichal and you in the other thread.
> 
> https://lore.kernel.org/all/17192e0f-7f18-49ae-96fc-71054d46f74a@xxxxxxxxxx/
> 
> I think there is some work to come. As to be sure that everybody is on
> the same page I will prepare a roadmap on how to proceed and what to
> discuss. There are many cases interfering with each other which make the
> solution pretty complex.

That's great. Let's do that so we can resolve this issue.

> 
> > > This all started with that series.
> > > 
> > > https://lore.kernel.org/all/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@xxxxxxxxxxxxxx/
> > > 
> > > And patch 2 of this series did work well so far. The next move was this
> > > patch.
> > > 
> > > Since the last week debugging we found out that it got other issues.
> > > It is not allways save to read the HWO bit, from the driver.
> > > 
> > > Turns out that after an new TRB was prepared with the HWO bit set
> > > it is not save to read immideatly back from that value as the hw
> > > will be doing some operations on that exactly new prepared TRB.
> > > 
> > > We ran into this problem when applying this patch. The trb buffer list
> > > was actually filled but we hit a false positive where the latest HWO bit
> > > was 0 (probably due to the hw action in the background) and therefor
> > > went into end transfer.
> > > 
> > 
> > Thanks for the update.
> 

Thanks,
Thinh