Re: DWC3 Isoc Gadget Missed Interrupts on Start Transfer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 04, 2022 at 01:43:00AM +0000, Thinh Nguyen wrote:
Michael Grzeschik wrote:
On Wed, Mar 02, 2022 at 10:22:55PM +0000, Thinh Nguyen wrote:
Michael Grzeschik wrote:
I am currently debugging a strange error with transferring isoc requests
with
the current mainline driver.

When the pending and started list is empty the complete interrupt
handler is
sending an End Transfer to the controller. This End Transfer will
then get
successfully completed.

    irq/50-dwc3-305     [000] d..1.   166.661845: dwc3_gadget_ep_cmd:
ep5in: cmd 'End Transfer' [b0d08] params 00000000 00000000 00000000 -->
status: Successful
    irq/50-dwc3-305     [000] d..1.   166.661877: dwc3_event: event
(080b01d6): ep5in: Endpoint Command Complete
    irq/50-dwc3-305     [000] d..1.   166.661979: dwc3_event: event
(482a10d6): ep5in: Transfer Not Ready [0000482a] (Not Active)


After this the Transfer needs to start again. But unfortunately this
sometimes
does not succeed. You see in the following trace that the Start of
the just
prepared trbs did complete with an -18.

   kworker/u5:2-347     [001] d..2.   166.691288: dwc3_prepare_trb:
ep5in: trb 000000007ff95a12 (E233:D232) buf 0000ffffffb1c2a8 size 1x 12
ctrl 00000465:00000000 (HlCs:Sc:isoc-first)
   kworker/u5:2-347     [001] d..2.   166.691293: dwc3_prepare_trb:
ep5in: trb 00000000f9cd175e (E234:D232) buf 0000ffffffb1d000 size 1x
7156 ctrl 00000471:00000000 (Hlcs:Sc:isoc)

[snip]


The intermediate Bus Expiry is not always present in that case, but also
completely acceptable as we loop
over the Start Transfer until this succeeds.

Do you have any Idea why this succeeded Start Streaming does come back
with an
Missed Transfer on the first trb?


Hi Michael,

When sending the trace, please leave it unfiltered. It's missing some
trace events that I want to check.

Right, it was a mix of trace_event:dwc3:* and ftrace_filter=dwc*. So I
kind of felt filtering out the interesting part would help.

Anyway. Here is the same situation in another trace, with all
trace_events from dwc3:


<snip>


Please check if the host is actually polling for data at that uframe.
Each TRB is scheduled for a particular interval. If the host doesn't
request for data at that particular interval, then it's a missed isoc
also (err -18). So, if you have a usb traffic analyzer, please verify
this. Also, maybe try with a different host vendor to see if the
behavior is the same.

The behavior seems to differ if I attach different hosts. I have no
traffic analyzer. Would usbmon under linux host work for that? How would

usbmon doesn't see usb packet level. So no, usbmon doesn't work.

I synchronize the gadget request and host urbs to know which frame
interval would have been the one that was missed in my logs?

It will be difficult. You'd have to check the expected interval the urb
was scheduled for from xhci driver, and you may have to check the
MFINDEX. You'd also need to determine a point of reference that the data
received from device is for a particular interval. Even if you have all
this info, you'd have to assume everything the host controller reported
is accurate and not missing anything.


Is there any way the gadget driver could see if for that interval the
host did not poll for data? OR are there bits to get this behaviour
predictable and the driver to react on?


You can check if there any "missed service error" from host, it may give
some clues, but not the whole picture.

Otherwise, why should on an ongoing stream the host not poll
for data in any interval. What would be the constraints to
ignore one interval.

The host driver may not intentionally does it. There are many reasons
where the host controller not poll for data, and I've seen it often. It
can be because of the driver handling interrupt latency/delay that
prevents the driver from feeding isoc TD/TRBs to the controller in time,
causing underflow and starve the controller. Some other time it's the
controller that decided the bandwidth is not sufficient for isoc
transfer or phy signal instability and cause a missed service error etc.

Anyway, in this particular case, from my experience, I don't think it's
the host that's missing a poll. It's probably because the device
scheduling it too aggressively early. Can you try to schedule on the 2nd
interval instead and see if it helps in your case.

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 3e75eaa13abc..b9ec7626b080 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2104,7 +2104,7 @@ static int __dwc3_gadget_start_isoc(struct dwc3_ep
*dep)
       }

       for (i = 0; i < DWC3_ISOC_MAX_RETRIES; i++) {
-               dep->frame_number = DWC3_ALIGN_FRAME(dep, i + 1);
+               dep->frame_number = DWC3_ALIGN_FRAME(dep, i + 2);

               ret = __dwc3_gadget_kick_transfer(dep);
               if (ret != -EAGAIN)



I think I found the case:

From my datasheet:

<quote>
 After a transfer has been started, the hardware will perform the following functions for
 IN endpoints:

 1. Fetch TX data as early as one interval prior to the beginning of the interval (call it A)
    if the HWO bit is set to one in the TRB.
 2. Decrement the buffer size of each TRB as packets are transmitted.
 3. Retire TRBs when their buffer size has reached 0, issuing an XferInProgress event if
    the IOC bit is set.
 4. If the next interval (B) starts before all the packets have been transmitted for interval A:
    a. Flush the TXFIFO.
    b. Retire the Buffer Descriptor of interval A with a "Missed Isochronous" status.
    c. Retire the Buffer Descriptor of interval B with a "Missed Isochronous" status.
    d. See Checking interval status for a description of how software can determine that
       an interval ended unexpectedly.
    e. Go to step 1 to prepare for interval C
 5. Otherwise, if all the TRBs of interval A are completed, the hardware will prepare for
    interval B.
</quote>

Could it be that in the observed case we just run directly into case 4
after issueing Stream Start. Since the frame interval we programm
is so close to the current running frame interval.

So actually applying the below patch would totally make sense. We could even
increase that to an higher number, just to make sure that we would not run into
an partially started frame interval when issueing the Start Transfer.


Thanks,
Michael

--
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux