Thinh Nguyen wrote: > Ferry Toth wrote: >> Hi, >> >> Op 03-04-2021 om 23:15 schreef Ferry Toth: >>> Hi, >>> >>> Op 03-04-2021 om 13:25 schreef Ferry Toth: >>>> Hi, >>>> >>>> Op 03-04-2021 om 04:02 schreef Thinh Nguyen: >>>>> Ferry Toth wrote: >>>>>> Hi, >>>>>> >>>>>> Op 02-04-2021 om 22:16 schreef Thinh Nguyen: >>>>>>> Ferry Toth wrote: >>>>>>>> Hi >>>>>>>> >>>>>>>> Op 30-03-2021 om 23:57 schreef Ferry Toth: >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> Op 30-03-2021 om 22:26 schreef Ferry Toth: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Andy Shevchenko <andy.shevchenko@xxxxxxxxx> writes: >>>>>>>>>>>> Hi! >>>>>>>>>>>> >>>>>>>>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm >>>>>>>>>>>> experimenting on v5.12-rc5 with a few patches (mostly >>>>>>>>>>>> configuration) >>>>>>>>>>>> applied [1]. I'm using Debian Unstable on the host machine and >>>>>>>>>>>> BuildRoot with the above mentioned kernel on the target. >>>>>>>>>>>> >>>>>>>>>>>> **So, scenario 0: >>>>>>>>>>>> 1. Run iperf3 -s on target >>>>>>>>>>>> 2. Run iperf3 -c ... -t 0 on the host >>>>>>>>>>>> 3. 0.00-10.36 sec 237 MBytes 192 Mbits/sec >>>>>>>>>>>> receiver >>>>>>>>>>>> >>>>>>>>>>>> **Scenario 1: >>>>>>>>>>>> 1. Now, detach USB cable, wait for several seconds, attach it >>>>>>>>>>>> back, >>>>>>>>>>>> repeat above: >>>>>>>>>>>> 0.00-9.94 sec 209 MBytes 176Mbits/sec receiver >>>>>>>>>>>> >>>>>>>>>>>> Note the bandwidth drop (177 vs. 192). >>>>>>>>>>>> >>>>>>>>>>>> (Repeating scenario 1 will give now the same result) >>>>>>>>>>>> >>>>>>>>>>>> **Scenario 2. >>>>>>>>>>>> 1. Detach USB cable, attach a device, for example USB stick, >>>>>>>>>>>> 2. See it being enumerated and detach it. >>>>>>>>>>>> 3. Attach cable from host >>>>>>>>>>>> 4 . 0.00-19.36 sec 315 MBytes 136 Mbits/sec >>>>>>>>>>>> receiver >>>>>>>>>>>> >>>>>>>>>>>> Note even more bandwidth drop! >>>>>>>>>>>> >>>>>>>>>>>> (Repeating scenario 1 keeps the same lower bandwidth) >>>>>>>>>>>> >>>>>>>>>>>> NOTE, sometimes on this scenario after several seconds the >>>>>>>>>>>> target >>>>>>>>>>>> simply reboots (w/o any logs [from kernel] printed)! >>>>>>>>>>>> >>>>>>>>>>>> So, any pointers on how to debug and what can be a smoking >>>>>>>>>>>> gun here? >>>>>>>>>>>> >>>>>>>>>>>> Ferry reported this in [2]. There are different kernel >>>>>>>>>>>> versions and >>>>>>>>>>>> tools to establish the connection (like connman vs. none in my >>>>>>>>>>>> case). >>>>>>>>>>>> >>>>>>>>>>>> [1]: >>>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [2]: >>>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> dwc3 tracepoints should give some initial hints. Look at packets >>>>>>>>>>> sizes >>>>>>>>>>> and period of transmission. From dwc3 side, I can't think of >>>>>>>>>>> anything we >>>>>>>>>>> would do to throttle the transmission, but tracepoints should >>>>>>>>>>> tell a >>>>>>>>>>> clearer story. >>>>>>>>>>> >>>>>>>>>> My testing (but yes, with difference kernel and network managed by >>>>>>>>>> connman) shows: >>>>>>>>>> >>>>>>>>>> 1) on cold boot eem network gadget works fine >>>>>>>>>> >>>>>>>>>> 2) after unplug or warm reboot (which is also an unplug) it's >>>>>>>>>> broken, >>>>>>>>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets >>>>>>>>>> lost, >>>>>>>>>> no configuration received from dhcp, occasional reboot, only >>>>>>>>>> way to >>>>>>>>>> fix is cold boot >>>>>>>>>> >>>>>>>>>> 3) if before unplug `connmanctl disable gadget`, on replugging and >>>>>>>>>> enabling it works fine >>>>>>>>>> >>>>>>>>>> My theory is that some HW register is disturbed on a surprise >>>>>>>>>> unplug, >>>>>>>>>> but not reset on plug or warm boot. But on cold boot is cleared. >>>>>>>>>> Maybe that can help to narrow down tracepoints? >>>>>>>>>> >>>>>>>>> I captured a plug after warm and after cold boot. This includes >>>>>>>>> network setup (dhcp). You can find it in [2] or directly link here: >>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> While the above traces in boot.zip allow compare which regs not >>>>>>>> correctly initialized on warm boot, I have now captured traces of >>>>>>>> unplug/plug. >>>>>>>> >>>>>>>> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged >>>>>>>> and the >>>>>>>> eem gadget network setup (dhcp). Then trace unplug. Then trace plug. >>>>>>>> >>>>>>>> After plug the eem connection is again broken. >>>>>>>> >>>>>>>> This might allow figuring out what goes wrong on unplug. Traces >>>>>>>> here: >>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip >>>>>>>> >>>>>>>> >>>>>>>> ** >>>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Were you able to narrow down the issue to only DWC3 device? (i.e. you >>>>>>> tested with different hosts and different device controllers to >>>>>>> confirm >>>>>>> this) >>>>>> I haven't tried with other devices. I have been forced to replace my >>>>>> host mobo and nothing changed. But I didn't pay attention to the >>>>>> particular host controller. >>>>>> >>>>> It'd be better if we can narrow down the culprit as this seems to me >>>>> like a synchronization issue at the upper layer between the host and >>>>> device. >>>>> >>>>>>> Did you see this issue previously? If not, is it possible to do git >>>>>>> bisection? >>>>>> This is with Intel Edison where main line usb gadget support appeared >>>>>> around 4.19 iirc. I believed the problem appeared between 5.4 and 5.7 >>>>>> and tried to bisect but failed. >>>>>> >>>>>> I realize only now that I failed because: >>>>>> 1) 5.4 already has this issue as I recently retested >>>>> I'm confused, why do you believe the problem is between 5.4 and 5.7 if >>>>> 5.4 already has this issue? So when did you start seeing this problem? >>>> >>>> Because at the time of 5.4 I didn't notice the issue as I normally >>>> did cold boots due to other problems on warm boot (i.e. sdhc >>>> inaccessible). >>>> >>>> I never new that on a cold boot it works. Even during bisecting I >>>> didn't know until the end, and then I found 5.4 has the same problem >>>> as all the later kernels (tested up to 5.11) >>>> >>>>> Also, these kernel versions are really old, there's been a lot of >>>>> updates/fixes to dwc3 since then. Can we run tests on the latest >>>>> kernel? >>>> >>>> I have tested 5.10.27, 5.11.0 and 5.11.4-rt11. >>>> >>>> But of course I am completely prepared to run Andy's latest >>>> (v5.12-rc5) on the device. >>>> >>>>>> 2) I didn't use a reproducible criterion. After warm reboot the eem >>>>>> gadget fails, but you can flip the host/gadget switch back and >>>>>> forth and >>>>>> have the illusion that the connection restored. >>>>>> >>>>>> The scenario described here is reproducible: leaving the switch in >>>>>> gadget mode eem works after cold boot only. And it likely breaks on >>>>>> unplug. >>>>>> >>>>>> A 2nd hint is that disabling gadget (I used `connmanctl disable >>>>>> gadget` >>>>>> but I believe that has the same effect as `iw link set dev usb0 down`) >>>>>> before unplug prevents messing up the driver, so you can replug and >>>>>> enable again. >>>>> These data points are good. However, we'd need to know where to look >>>>> first. The issue isn't obvious from the DWC3 controller or the DWC3 >>>>> driver. >>>>> >>>>> Can you check a few things: >>>>> 1) Any error/timeout messages from the host's dmesg? Or device side? >>>> >>>> I'll add log from the host side. >>>> >>>> For now I only see (on a warm plug): >>>> >>>> kernel: usb 1-11: can't set config #1, error -110 >>>> >>>>> 2) What kernel version is your host using? Can you use the latest for >>>>> both host and device? >>>> >>>> The host is ubuntu's amd64 5.8.0-48-generic. >>>> >>>> I will test with v5.12-rc5 from ubuntu kernel ppa on the host. And >>>> Andy's latest (v5.12-rc5) on the device. >>> >>> I upgraded host kernel, but not yet device and captured relevant host >>> journal messages and device traces. Something did change: after cold >>> boot I don't a eem until after I unplug/replug. I then traced a iperf >>> transfer. Then after again unplug/replug I get the throttled >>> connection, which I also traced. >>> >>> See >>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6253414/transfer.zip__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_101A7wlQD$ >>> >> >> Now, with host updated to ubuntu kernel ppa 5.12.0-051200rc5-generic and >> edison to 5.12.0-rc5-edison-acpi-standard vanilla + 2 patches appearing >> in rc6: >> >> * "usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable" >> * "usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield" >> >> plus one from >> https://urldefense.com/v3/__https://github.com/andy-shev/linux/commits/eds-acpi__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_105iEe4TE$ ; >> <https://urldefense.com/v3/__https://github.com/andy-shev/linux/commits/eds-acpi__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_105iEe4TE$ >>> >> >> * "TODO: driver core: Break infinite loop when deferred probe can't be >> satisfied" >> >> I captured one good and one bad connection, plus logs on the host side >> see journalctl-plus-comments.txt in >> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6260614/5.12-rc5.zip__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_10w5OhpD1$ >> >>> >>>> I am expecting results this evening. >>>> >>>>> 3) Snapshot of dwc3 tracepoints of active transfers between the normal >>>>> vs throttled of the latest kernel >>>> >>>> I don't know if the problem I see is really throttling. >>>> >>>> I can trace an active transfer, but that does actually throttle from >>>> 200Mb/s down to 139MB/s and produces a trace of 53MB. (2x1sec of >>>> iperf3). >>>> > > > I took a look at the "bad" and "normal" tracepoints. There are a few > 1-second delays where the host tried to bring the device back and > resume from low power: > > ksoftirqd/0-10 [000] d.s. 231.501808: dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params 00000000 00000000 00000000 --> status: Successful > ksoftirqd/0-10 [000] d.s. 231.501809: dwc3_readl: addr 00000000d68ecd36 value 0000a610 > ksoftirqd/0-10 [000] d.s. 231.501810: dwc3_writel: addr 00000000d68ecd36 value 0000a710 > <idle>-0 [000] d.h. 232.499418: dwc3_readl: addr 00000000a15e0e35 value 00000034 > <idle>-0 [000] d.h. 232.499423: dwc3_readl: addr 00000000bb67b585 value 00001000 > <idle>-0 [000] d.h. 232.499425: dwc3_writel: addr 00000000bb67b585 value 80001000 > <idle>-0 [000] d.h. 232.499427: dwc3_writel: addr 00000000a15e0e35 value 00000034 > irq/15-dwc3-476 [000] d... 232.499480: dwc3_event: event (00000401): WakeUp [U0] > irq/15-dwc3-476 [000] d... 232.499492: dwc3_event: event (00000401): WakeUp [U0] > irq/15-dwc3-476 [000] d... 232.499496: dwc3_event: event (00006088): ep2out: Transfer In Progress [0] (SIm) > irq/15-dwc3-476 [000] d... 232.499501: dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal) > irq/15-dwc3-476 [000] d... 232.499518: dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536 zsI ==> 0 > irq/15-dwc3-476 [000] d... 232.499562: dwc3_ep_queue: ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115 > irq/15-dwc3-476 [000] d... 232.499601: dwc3_prepare_trb: ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size 1536 ctrl 00000819 (HlcS:sC:normal) > > > Your device is operating in highspeed right? Try to turn off LPM from > host and see if that helps with the speed throttling issue. (If you're > using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with > the connection issue you saw. > > It seems to be an issue from host, but I can't tell for sure unless we > have some USB traffic analyzer that shows what's going on. Have you > tried different hosts? > You can also disable LPM from the gadget side by setting dwc->dis_enblslpm_quirk. BR, Thinh