Ferry Toth wrote: > Hi, > > Op 03-04-2021 om 23:15 schreef Ferry Toth: >> Hi, >> >> Op 03-04-2021 om 13:25 schreef Ferry Toth: >>> Hi, >>> >>> Op 03-04-2021 om 04:02 schreef Thinh Nguyen: >>>> Ferry Toth wrote: >>>>> Hi, >>>>> >>>>> Op 02-04-2021 om 22:16 schreef Thinh Nguyen: >>>>>> Ferry Toth wrote: >>>>>>> Hi >>>>>>> >>>>>>> Op 30-03-2021 om 23:57 schreef Ferry Toth: >>>>>>>> Hi >>>>>>>> >>>>>>>> Op 30-03-2021 om 22:26 schreef Ferry Toth: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Andy Shevchenko <andy.shevchenko@xxxxxxxxx> writes: >>>>>>>>>>> Hi! >>>>>>>>>>> >>>>>>>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm >>>>>>>>>>> experimenting on v5.12-rc5 with a few patches (mostly >>>>>>>>>>> configuration) >>>>>>>>>>> applied [1]. I'm using Debian Unstable on the host machine and >>>>>>>>>>> BuildRoot with the above mentioned kernel on the target. >>>>>>>>>>> >>>>>>>>>>> **So, scenario 0: >>>>>>>>>>> 1. Run iperf3 -s on target >>>>>>>>>>> 2. Run iperf3 -c ... -t 0 on the host >>>>>>>>>>> 3. 0.00-10.36 sec 237 MBytes 192 Mbits/sec >>>>>>>>>>> receiver >>>>>>>>>>> >>>>>>>>>>> **Scenario 1: >>>>>>>>>>> 1. Now, detach USB cable, wait for several seconds, attach it >>>>>>>>>>> back, >>>>>>>>>>> repeat above: >>>>>>>>>>> 0.00-9.94 sec 209 MBytes 176Mbits/sec receiver >>>>>>>>>>> >>>>>>>>>>> Note the bandwidth drop (177 vs. 192). >>>>>>>>>>> >>>>>>>>>>> (Repeating scenario 1 will give now the same result) >>>>>>>>>>> >>>>>>>>>>> **Scenario 2. >>>>>>>>>>> 1. Detach USB cable, attach a device, for example USB stick, >>>>>>>>>>> 2. See it being enumerated and detach it. >>>>>>>>>>> 3. Attach cable from host >>>>>>>>>>> 4 . 0.00-19.36 sec 315 MBytes 136 Mbits/sec >>>>>>>>>>> receiver >>>>>>>>>>> >>>>>>>>>>> Note even more bandwidth drop! >>>>>>>>>>> >>>>>>>>>>> (Repeating scenario 1 keeps the same lower bandwidth) >>>>>>>>>>> >>>>>>>>>>> NOTE, sometimes on this scenario after several seconds the >>>>>>>>>>> target >>>>>>>>>>> simply reboots (w/o any logs [from kernel] printed)! >>>>>>>>>>> >>>>>>>>>>> So, any pointers on how to debug and what can be a smoking >>>>>>>>>>> gun here? >>>>>>>>>>> >>>>>>>>>>> Ferry reported this in [2]. There are different kernel >>>>>>>>>>> versions and >>>>>>>>>>> tools to establish the connection (like connman vs. none in my >>>>>>>>>>> case). >>>>>>>>>>> >>>>>>>>>>> [1]: >>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [2]: >>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> dwc3 tracepoints should give some initial hints. Look at packets >>>>>>>>>> sizes >>>>>>>>>> and period of transmission. From dwc3 side, I can't think of >>>>>>>>>> anything we >>>>>>>>>> would do to throttle the transmission, but tracepoints should >>>>>>>>>> tell a >>>>>>>>>> clearer story. >>>>>>>>>> >>>>>>>>> My testing (but yes, with difference kernel and network managed by >>>>>>>>> connman) shows: >>>>>>>>> >>>>>>>>> 1) on cold boot eem network gadget works fine >>>>>>>>> >>>>>>>>> 2) after unplug or warm reboot (which is also an unplug) it's >>>>>>>>> broken, >>>>>>>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets >>>>>>>>> lost, >>>>>>>>> no configuration received from dhcp, occasional reboot, only >>>>>>>>> way to >>>>>>>>> fix is cold boot >>>>>>>>> >>>>>>>>> 3) if before unplug `connmanctl disable gadget`, on replugging and >>>>>>>>> enabling it works fine >>>>>>>>> >>>>>>>>> My theory is that some HW register is disturbed on a surprise >>>>>>>>> unplug, >>>>>>>>> but not reset on plug or warm boot. But on cold boot is cleared. >>>>>>>>> Maybe that can help to narrow down tracepoints? >>>>>>>>> >>>>>>>> I captured a plug after warm and after cold boot. This includes >>>>>>>> network setup (dhcp). You can find it in [2] or directly link here: >>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> While the above traces in boot.zip allow compare which regs not >>>>>>> correctly initialized on warm boot, I have now captured traces of >>>>>>> unplug/plug. >>>>>>> >>>>>>> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged >>>>>>> and the >>>>>>> eem gadget network setup (dhcp). Then trace unplug. Then trace plug. >>>>>>> >>>>>>> After plug the eem connection is again broken. >>>>>>> >>>>>>> This might allow figuring out what goes wrong on unplug. Traces >>>>>>> here: >>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip >>>>>>> >>>>>>> >>>>>>> ** >>>>>>> >>>>>> Hi, >>>>>> >>>>>> Were you able to narrow down the issue to only DWC3 device? (i.e. you >>>>>> tested with different hosts and different device controllers to >>>>>> confirm >>>>>> this) >>>>> I haven't tried with other devices. I have been forced to replace my >>>>> host mobo and nothing changed. But I didn't pay attention to the >>>>> particular host controller. >>>>> >>>> It'd be better if we can narrow down the culprit as this seems to me >>>> like a synchronization issue at the upper layer between the host and >>>> device. >>>> >>>>>> Did you see this issue previously? If not, is it possible to do git >>>>>> bisection? >>>>> This is with Intel Edison where main line usb gadget support appeared >>>>> around 4.19 iirc. I believed the problem appeared between 5.4 and 5.7 >>>>> and tried to bisect but failed. >>>>> >>>>> I realize only now that I failed because: >>>>> 1) 5.4 already has this issue as I recently retested >>>> I'm confused, why do you believe the problem is between 5.4 and 5.7 if >>>> 5.4 already has this issue? So when did you start seeing this problem? >>> >>> Because at the time of 5.4 I didn't notice the issue as I normally >>> did cold boots due to other problems on warm boot (i.e. sdhc >>> inaccessible). >>> >>> I never new that on a cold boot it works. Even during bisecting I >>> didn't know until the end, and then I found 5.4 has the same problem >>> as all the later kernels (tested up to 5.11) >>> >>>> Also, these kernel versions are really old, there's been a lot of >>>> updates/fixes to dwc3 since then. Can we run tests on the latest >>>> kernel? >>> >>> I have tested 5.10.27, 5.11.0 and 5.11.4-rt11. >>> >>> But of course I am completely prepared to run Andy's latest >>> (v5.12-rc5) on the device. >>> >>>>> 2) I didn't use a reproducible criterion. After warm reboot the eem >>>>> gadget fails, but you can flip the host/gadget switch back and >>>>> forth and >>>>> have the illusion that the connection restored. >>>>> >>>>> The scenario described here is reproducible: leaving the switch in >>>>> gadget mode eem works after cold boot only. And it likely breaks on >>>>> unplug. >>>>> >>>>> A 2nd hint is that disabling gadget (I used `connmanctl disable >>>>> gadget` >>>>> but I believe that has the same effect as `iw link set dev usb0 down`) >>>>> before unplug prevents messing up the driver, so you can replug and >>>>> enable again. >>>> These data points are good. However, we'd need to know where to look >>>> first. The issue isn't obvious from the DWC3 controller or the DWC3 >>>> driver. >>>> >>>> Can you check a few things: >>>> 1) Any error/timeout messages from the host's dmesg? Or device side? >>> >>> I'll add log from the host side. >>> >>> For now I only see (on a warm plug): >>> >>> kernel: usb 1-11: can't set config #1, error -110 >>> >>>> 2) What kernel version is your host using? Can you use the latest for >>>> both host and device? >>> >>> The host is ubuntu's amd64 5.8.0-48-generic. >>> >>> I will test with v5.12-rc5 from ubuntu kernel ppa on the host. And >>> Andy's latest (v5.12-rc5) on the device. >> >> I upgraded host kernel, but not yet device and captured relevant host >> journal messages and device traces. Something did change: after cold >> boot I don't a eem until after I unplug/replug. I then traced a iperf >> transfer. Then after again unplug/replug I get the throttled >> connection, which I also traced. >> >> See >> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6253414/transfer.zip__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_101A7wlQD$ >> > > Now, with host updated to ubuntu kernel ppa 5.12.0-051200rc5-generic and > edison to 5.12.0-rc5-edison-acpi-standard vanilla + 2 patches appearing > in rc6: > > * "usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable" > * "usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield" > > plus one from > https://urldefense.com/v3/__https://github.com/andy-shev/linux/commits/eds-acpi__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_105iEe4TE$ ; > <https://urldefense.com/v3/__https://github.com/andy-shev/linux/commits/eds-acpi__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_105iEe4TE$ >> > > * "TODO: driver core: Break infinite loop when deferred probe can't be > satisfied" > > I captured one good and one bad connection, plus logs on the host side > see journalctl-plus-comments.txt in > https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6260614/5.12-rc5.zip__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_10w5OhpD1$ > >> >>> I am expecting results this evening. >>> >>>> 3) Snapshot of dwc3 tracepoints of active transfers between the normal >>>> vs throttled of the latest kernel >>> >>> I don't know if the problem I see is really throttling. >>> >>> I can trace an active transfer, but that does actually throttle from >>> 200Mb/s down to 139MB/s and produces a trace of 53MB. (2x1sec of >>> iperf3). >>> I took a look at the "bad" and "normal" tracepoints. There are a few 1-second delays where the host tried to bring the device back and resume from low power: ksoftirqd/0-10 [000] d.s. 231.501808: dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params 00000000 00000000 00000000 --> status: Successful ksoftirqd/0-10 [000] d.s. 231.501809: dwc3_readl: addr 00000000d68ecd36 value 0000a610 ksoftirqd/0-10 [000] d.s. 231.501810: dwc3_writel: addr 00000000d68ecd36 value 0000a710 <idle>-0 [000] d.h. 232.499418: dwc3_readl: addr 00000000a15e0e35 value 00000034 <idle>-0 [000] d.h. 232.499423: dwc3_readl: addr 00000000bb67b585 value 00001000 <idle>-0 [000] d.h. 232.499425: dwc3_writel: addr 00000000bb67b585 value 80001000 <idle>-0 [000] d.h. 232.499427: dwc3_writel: addr 00000000a15e0e35 value 00000034 irq/15-dwc3-476 [000] d... 232.499480: dwc3_event: event (00000401): WakeUp [U0] irq/15-dwc3-476 [000] d... 232.499492: dwc3_event: event (00000401): WakeUp [U0] irq/15-dwc3-476 [000] d... 232.499496: dwc3_event: event (00006088): ep2out: Transfer In Progress [0] (SIm) irq/15-dwc3-476 [000] d... 232.499501: dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal) irq/15-dwc3-476 [000] d... 232.499518: dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536 zsI ==> 0 irq/15-dwc3-476 [000] d... 232.499562: dwc3_ep_queue: ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115 irq/15-dwc3-476 [000] d... 232.499601: dwc3_prepare_trb: ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size 1536 ctrl 00000819 (HlcS:sC:normal) Your device is operating in highspeed right? Try to turn off LPM from host and see if that helps with the speed throttling issue. (If you're using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with the connection issue you saw. It seems to be an issue from host, but I can't tell for sure unless we have some USB traffic analyzer that shows what's going on. Have you tried different hosts? BR, Thinh