Michael Grzeschik wrote: > On Fri, Mar 04, 2022 at 01:43:00AM +0000, Thinh Nguyen wrote: >> Michael Grzeschik wrote: >>> On Wed, Mar 02, 2022 at 10:22:55PM +0000, Thinh Nguyen wrote: >>>> Michael Grzeschik wrote: >>>>> I am currently debugging a strange error with transferring isoc >>>>> requests >>>>> with >>>>> the current mainline driver. >>>>> >>>>> When the pending and started list is empty the complete interrupt >>>>> handler is >>>>> sending an End Transfer to the controller. This End Transfer will >>>>> then get >>>>> successfully completed. >>>>> >>>>> irq/50-dwc3-305 [000] d..1. 166.661845: dwc3_gadget_ep_cmd: >>>>> ep5in: cmd 'End Transfer' [b0d08] params 00000000 00000000 00000000 >>>>> --> >>>>> status: Successful >>>>> irq/50-dwc3-305 [000] d..1. 166.661877: dwc3_event: event >>>>> (080b01d6): ep5in: Endpoint Command Complete >>>>> irq/50-dwc3-305 [000] d..1. 166.661979: dwc3_event: event >>>>> (482a10d6): ep5in: Transfer Not Ready [0000482a] (Not Active) >>>>> >>>>> >>>>> After this the Transfer needs to start again. But unfortunately this >>>>> sometimes >>>>> does not succeed. You see in the following trace that the Start of >>>>> the just >>>>> prepared trbs did complete with an -18. >>>>> >>>>> kworker/u5:2-347 [001] d..2. 166.691288: dwc3_prepare_trb: >>>>> ep5in: trb 000000007ff95a12 (E233:D232) buf 0000ffffffb1c2a8 size >>>>> 1x 12 >>>>> ctrl 00000465:00000000 (HlCs:Sc:isoc-first) >>>>> kworker/u5:2-347 [001] d..2. 166.691293: dwc3_prepare_trb: >>>>> ep5in: trb 00000000f9cd175e (E234:D232) buf 0000ffffffb1d000 size 1x >>>>> 7156 ctrl 00000471:00000000 (Hlcs:Sc:isoc) >>> >>> [snip] >>> >>>>> >>>>> The intermediate Bus Expiry is not always present in that case, but >>>>> also >>>>> completely acceptable as we loop >>>>> over the Start Transfer until this succeeds. >>>>> >>>>> Do you have any Idea why this succeeded Start Streaming does come back >>>>> with an >>>>> Missed Transfer on the first trb? >>>>> >>>> >>>> Hi Michael, >>>> >>>> When sending the trace, please leave it unfiltered. It's missing some >>>> trace events that I want to check. >>> >>> Right, it was a mix of trace_event:dwc3:* and ftrace_filter=dwc*. So I >>> kind of felt filtering out the interesting part would help. >>> >>> Anyway. Here is the same situation in another trace, with all >>> trace_events from dwc3: >>> >> >> <snip> >> >>> >>>> Please check if the host is actually polling for data at that uframe. >>>> Each TRB is scheduled for a particular interval. If the host doesn't >>>> request for data at that particular interval, then it's a missed isoc >>>> also (err -18). So, if you have a usb traffic analyzer, please verify >>>> this. Also, maybe try with a different host vendor to see if the >>>> behavior is the same. >>> >>> The behavior seems to differ if I attach different hosts. I have no >>> traffic analyzer. Would usbmon under linux host work for that? How would >> >> usbmon doesn't see usb packet level. So no, usbmon doesn't work. >> >>> I synchronize the gadget request and host urbs to know which frame >>> interval would have been the one that was missed in my logs? >> >> It will be difficult. You'd have to check the expected interval the urb >> was scheduled for from xhci driver, and you may have to check the >> MFINDEX. You'd also need to determine a point of reference that the data >> received from device is for a particular interval. Even if you have all >> this info, you'd have to assume everything the host controller reported >> is accurate and not missing anything. >> >>> >>> Is there any way the gadget driver could see if for that interval the >>> host did not poll for data? OR are there bits to get this behaviour >>> predictable and the driver to react on? >>> >> >> You can check if there any "missed service error" from host, it may give >> some clues, but not the whole picture. >> >>> Otherwise, why should on an ongoing stream the host not poll >>> for data in any interval. What would be the constraints to >>> ignore one interval. >> >> The host driver may not intentionally does it. There are many reasons >> where the host controller not poll for data, and I've seen it often. It >> can be because of the driver handling interrupt latency/delay that >> prevents the driver from feeding isoc TD/TRBs to the controller in time, >> causing underflow and starve the controller. Some other time it's the >> controller that decided the bandwidth is not sufficient for isoc >> transfer or phy signal instability and cause a missed service error etc. >> >> Anyway, in this particular case, from my experience, I don't think it's >> the host that's missing a poll. It's probably because the device >> scheduling it too aggressively early. Can you try to schedule on the 2nd >> interval instead and see if it helps in your case. >> >> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >> index 3e75eaa13abc..b9ec7626b080 100644 >> --- a/drivers/usb/dwc3/gadget.c >> +++ b/drivers/usb/dwc3/gadget.c >> @@ -2104,7 +2104,7 @@ static int __dwc3_gadget_start_isoc(struct dwc3_ep >> *dep) >> } >> >> for (i = 0; i < DWC3_ISOC_MAX_RETRIES; i++) { >> - dep->frame_number = DWC3_ALIGN_FRAME(dep, i + 1); >> + dep->frame_number = DWC3_ALIGN_FRAME(dep, i + 2); >> >> ret = __dwc3_gadget_kick_transfer(dep); >> if (ret != -EAGAIN) >> >> > > I think I found the case: > > From my datasheet: > > <quote> > After a transfer has been started, the hardware will perform the > following functions for > IN endpoints: > > 1. Fetch TX data as early as one interval prior to the beginning of the > interval (call it A) > if the HWO bit is set to one in the TRB. > 2. Decrement the buffer size of each TRB as packets are transmitted. > 3. Retire TRBs when their buffer size has reached 0, issuing an > XferInProgress event if > the IOC bit is set. > 4. If the next interval (B) starts before all the packets have been > transmitted for interval A: > a. Flush the TXFIFO. > b. Retire the Buffer Descriptor of interval A with a "Missed > Isochronous" status. > c. Retire the Buffer Descriptor of interval B with a "Missed > Isochronous" status. > d. See Checking interval status for a description of how software > can determine that > an interval ended unexpectedly. > e. Go to step 1 to prepare for interval C > 5. Otherwise, if all the TRBs of interval A are completed, the hardware > will prepare for > interval B. > </quote> > > Could it be that in the observed case we just run directly into case 4 > after issueing Stream Start. Since the frame interval we programm > is so close to the current running frame interval. Yes, that's what the code snippet I sent to confirm the case was for. Typically, the Start Transfer command status "bus expiry" will tell the driver of this case, but that may not be enough. > > So actually applying the below patch would totally make sense. We could > even > increase that to an higher number, just to make sure that we would not > run into > an partially started frame interval when issueing the Start Transfer. > See my other email to make a patch. Thanks, Thinh