Re: USB network gadget / DWC3 issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thinh Nguyen wrote:
> Ferry Toth wrote:
>> Hi,
>>
>> Op 03-04-2021 om 23:15 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 03-04-2021 om 13:25 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 03-04-2021 om 04:02 schreef Thinh Nguyen:
>>>>> Ferry Toth wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Op 02-04-2021 om 22:16 schreef Thinh Nguyen:
>>>>>>> Ferry Toth wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> Op 30-03-2021 om 23:57 schreef Ferry Toth:
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> Op 30-03-2021 om 22:26 schreef Ferry Toth:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Andy Shevchenko <andy.shevchenko@xxxxxxxxx> writes:
>>>>>>>>>>>> Hi!
>>>>>>>>>>>>
>>>>>>>>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>>>>>>>>>>> experimenting on v5.12-rc5 with a few patches (mostly
>>>>>>>>>>>> configuration)
>>>>>>>>>>>> applied [1]. I'm using Debian Unstable on the host machine and
>>>>>>>>>>>> BuildRoot with the above mentioned kernel on the target.
>>>>>>>>>>>>
>>>>>>>>>>>> **So, scenario 0:
>>>>>>>>>>>> 1. Run iperf3 -s on target
>>>>>>>>>>>> 2. Run iperf3 -c ... -t 0 on the host
>>>>>>>>>>>> 3.  0.00-10.36  sec   237 MBytes  192 Mbits/sec
>>>>>>>>>>>> receiver
>>>>>>>>>>>>
>>>>>>>>>>>> **Scenario 1:
>>>>>>>>>>>> 1. Now, detach USB cable, wait for several seconds, attach it
>>>>>>>>>>>> back,
>>>>>>>>>>>> repeat above:
>>>>>>>>>>>> 0.00-9.94   sec   209 MBytes   176Mbits/sec receiver
>>>>>>>>>>>>
>>>>>>>>>>>> Note the bandwidth drop (177 vs. 192).
>>>>>>>>>>>>
>>>>>>>>>>>> (Repeating scenario 1 will give now the same result)
>>>>>>>>>>>>
>>>>>>>>>>>> **Scenario 2.
>>>>>>>>>>>> 1. Detach USB cable, attach a device, for example USB stick,
>>>>>>>>>>>> 2. See it being enumerated and detach it.
>>>>>>>>>>>> 3. Attach cable from host
>>>>>>>>>>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec
>>>>>>>>>>>> receiver
>>>>>>>>>>>>
>>>>>>>>>>>> Note even more bandwidth drop!
>>>>>>>>>>>>
>>>>>>>>>>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>>>>>>>>>>
>>>>>>>>>>>> NOTE, sometimes on this scenario after several seconds the
>>>>>>>>>>>> target
>>>>>>>>>>>> simply reboots (w/o any logs [from kernel] printed)!
>>>>>>>>>>>>
>>>>>>>>>>>> So, any pointers on how to debug and what can be a smoking
>>>>>>>>>>>> gun here?
>>>>>>>>>>>>
>>>>>>>>>>>> Ferry reported this in [2]. There are different kernel
>>>>>>>>>>>> versions and
>>>>>>>>>>>> tools to establish the connection (like connman vs. none in my
>>>>>>>>>>>> case).
>>>>>>>>>>>>
>>>>>>>>>>>> [1]:
>>>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [2]:
>>>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> dwc3 tracepoints should give some initial hints. Look at packets
>>>>>>>>>>> sizes
>>>>>>>>>>> and period of transmission. From dwc3 side, I can't think of
>>>>>>>>>>> anything we
>>>>>>>>>>> would do to throttle the transmission, but tracepoints should
>>>>>>>>>>> tell a
>>>>>>>>>>> clearer story.
>>>>>>>>>>>
>>>>>>>>>> My testing (but yes, with difference kernel and network managed by
>>>>>>>>>> connman) shows:
>>>>>>>>>>
>>>>>>>>>> 1) on cold boot eem network gadget works fine
>>>>>>>>>>
>>>>>>>>>> 2) after unplug or warm reboot (which is also an unplug) it's
>>>>>>>>>> broken,
>>>>>>>>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets
>>>>>>>>>> lost,
>>>>>>>>>> no configuration received from dhcp, occasional reboot, only
>>>>>>>>>> way to
>>>>>>>>>> fix is cold boot
>>>>>>>>>>
>>>>>>>>>> 3) if before unplug `connmanctl disable gadget`, on replugging and
>>>>>>>>>> enabling it works fine
>>>>>>>>>>
>>>>>>>>>> My theory is that some HW register is disturbed on a surprise
>>>>>>>>>> unplug,
>>>>>>>>>> but not reset on plug or warm boot. But on cold boot is cleared.
>>>>>>>>>> Maybe that can help to narrow down tracepoints?
>>>>>>>>>>
>>>>>>>>> I captured a plug after warm and after cold boot. This includes
>>>>>>>>> network setup (dhcp). You can find it in [2] or directly link here:
>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> While the above traces in boot.zip allow compare which regs not
>>>>>>>> correctly initialized on warm boot, I have now captured traces of
>>>>>>>> unplug/plug.
>>>>>>>>
>>>>>>>> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged
>>>>>>>> and the
>>>>>>>> eem gadget network setup (dhcp). Then trace unplug. Then trace plug.
>>>>>>>>
>>>>>>>> After plug the eem connection is again broken.
>>>>>>>>
>>>>>>>> This might allow figuring out what goes wrong on unplug. Traces
>>>>>>>> here:
>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip
>>>>>>>>
>>>>>>>>
>>>>>>>> **
>>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Were you able to narrow down the issue to only DWC3 device? (i.e. you
>>>>>>> tested with different hosts and different device controllers to
>>>>>>> confirm
>>>>>>> this)
>>>>>> I haven't tried with other devices. I have been forced to replace my
>>>>>> host mobo and nothing changed. But I didn't pay attention to the
>>>>>> particular host controller.
>>>>>>
>>>>> It'd be better if we can narrow down the culprit as this seems to me
>>>>> like a synchronization issue at the upper layer between the host and
>>>>> device.
>>>>>
>>>>>>> Did you see this issue previously? If not, is it possible to do git
>>>>>>> bisection?
>>>>>> This is with Intel Edison where main line usb gadget support appeared
>>>>>> around 4.19 iirc. I believed the problem appeared between 5.4 and 5.7
>>>>>> and tried to bisect but failed.
>>>>>>
>>>>>> I realize only now that I failed because:
>>>>>> 1) 5.4 already has this issue as I recently retested
>>>>> I'm confused, why do you believe the problem is between 5.4 and 5.7 if
>>>>> 5.4 already has this issue? So when did you start seeing this problem?
>>>>
>>>> Because at the time of 5.4 I didn't notice the issue as I normally
>>>> did cold boots due to other problems on warm boot (i.e. sdhc
>>>> inaccessible).
>>>>
>>>> I never new that on a cold boot it works. Even during bisecting I
>>>> didn't know until the end, and then I found 5.4 has the same problem
>>>> as all the later kernels (tested up to 5.11)
>>>>
>>>>> Also, these kernel versions are really old, there's been a lot of
>>>>> updates/fixes to dwc3 since then. Can we run tests on the latest
>>>>> kernel?
>>>>
>>>> I have tested 5.10.27, 5.11.0 and 5.11.4-rt11.
>>>>
>>>> But of course I am completely prepared to run Andy's latest
>>>> (v5.12-rc5) on the device.
>>>>
>>>>>> 2) I didn't use a reproducible criterion. After warm reboot the eem
>>>>>> gadget fails, but you can flip the host/gadget switch back and
>>>>>> forth and
>>>>>> have the illusion that the connection restored.
>>>>>>
>>>>>> The scenario described here is reproducible: leaving the switch in
>>>>>> gadget mode eem works after cold boot only. And it likely breaks on
>>>>>> unplug.
>>>>>>
>>>>>> A 2nd hint is that disabling gadget (I used `connmanctl disable
>>>>>> gadget`
>>>>>> but I believe that has the same effect as `iw link set dev usb0 down`)
>>>>>> before unplug prevents messing up the driver, so you can replug and
>>>>>> enable again.
>>>>> These data points are good. However, we'd need to know where to look
>>>>> first. The issue isn't obvious from the DWC3 controller or the DWC3
>>>>> driver.
>>>>>
>>>>> Can you check a few things:
>>>>> 1) Any error/timeout messages from the host's dmesg? Or device side?
>>>>
>>>> I'll add log from the host side.
>>>>
>>>> For now I only see (on a warm plug):
>>>>
>>>> kernel: usb 1-11: can't set config #1, error -110
>>>>
>>>>> 2) What kernel version is your host using? Can you use the latest for
>>>>> both host and device?
>>>>
>>>> The host is ubuntu's amd64 5.8.0-48-generic.
>>>>
>>>> I will test with v5.12-rc5  from ubuntu kernel ppa on the host. And
>>>> Andy's latest (v5.12-rc5) on the device.
>>>
>>> I upgraded host kernel, but not yet device and captured relevant host
>>> journal messages and device traces. Something did change: after cold
>>> boot I don't a eem until after I unplug/replug. I then traced a iperf
>>> transfer. Then after again unplug/replug I get the throttled
>>> connection, which I also traced.
>>>
>>> See
>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6253414/transfer.zip__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_101A7wlQD$
>>>
>>
>> Now, with host updated to ubuntu kernel ppa 5.12.0-051200rc5-generic and
>> edison to 5.12.0-rc5-edison-acpi-standard vanilla + 2 patches appearing
>> in rc6:
>>
>> * "usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable"
>> * "usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield"
>>
>> plus one from
>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/commits/eds-acpi__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_105iEe4TE$ ;
>> <https://urldefense.com/v3/__https://github.com/andy-shev/linux/commits/eds-acpi__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_105iEe4TE$
>>>
>>
>> * "TODO: driver core: Break infinite loop when deferred probe can't be
>> satisfied"
>>
>> I captured one good and one bad connection, plus logs on the host side
>> see journalctl-plus-comments.txt in
>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6260614/5.12-rc5.zip__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_10w5OhpD1$
>>
>>>
>>>> I am expecting results this evening.
>>>>
>>>>> 3) Snapshot of dwc3 tracepoints of active transfers between the normal
>>>>> vs throttled of the latest kernel
>>>>
>>>> I don't know if the problem I see is really throttling.
>>>>
>>>> I can trace an active transfer, but that does actually throttle from
>>>> 200Mb/s down to 139MB/s and produces a trace of 53MB. (2x1sec of
>>>> iperf3).
>>>>
> 
> 
> I took a look at the "bad" and "normal" tracepoints. There are a few
> 1-second delays where the host tried to bring the device back and
> resume from low power:
> 
>      ksoftirqd/0-10      [000] d.s.   231.501808: dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params 00000000 00000000 00000000 --> status: Successful
>      ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: addr 00000000d68ecd36 value 0000a610
>      ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: addr 00000000d68ecd36 value 0000a710
>           <idle>-0       [000] d.h.   232.499418: dwc3_readl: addr 00000000a15e0e35 value 00000034
>           <idle>-0       [000] d.h.   232.499423: dwc3_readl: addr 00000000bb67b585 value 00001000
>           <idle>-0       [000] d.h.   232.499425: dwc3_writel: addr 00000000bb67b585 value 80001000
>           <idle>-0       [000] d.h.   232.499427: dwc3_writel: addr 00000000a15e0e35 value 00000034
>      irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: event (00000401): WakeUp [U0]
>      irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: event (00000401): WakeUp [U0]
>      irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: event (00006088): ep2out: Transfer In Progress [0] (SIm)
>      irq/15-dwc3-476     [000] d...   232.499501: dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>      irq/15-dwc3-476     [000] d...   232.499518: dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536 zsI ==> 0
>      irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue: ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>      irq/15-dwc3-476     [000] d...   232.499601: dwc3_prepare_trb: ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size 1536 ctrl 00000819 (HlcS:sC:normal)
> 
> 
> Your device is operating in highspeed right? Try to turn off LPM from
> host and see if that helps with the speed throttling issue. (If you're
> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with
> the connection issue you saw.
> 
> It seems to be an issue from host, but I can't tell for sure unless we
> have some USB traffic analyzer that shows what's going on. Have you
> tried different hosts?
> 

You can also disable LPM from the gadget side by setting
dwc->dis_enblslpm_quirk.

BR,
Thinh




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux