Hi Op 15-05-2024 om 20:38 schreef Ferry Toth:
Hi, Sorry to have kept you waiting, I was away for a short break. Patch tested below. Op 10-05-2024 om 11:45 schreef Hardik Gajjar:On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:Could you please try with the following patch and see if your issue resolves ?Oops, sorry, wrong file attached . Now correct one. Op 02-05-2024 om 22:13 schreef Ferry Toth:Op 02-05-2024 om 17:29 schreef Hardik Gajjar:On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:Hi, Op 30-04-2024 om 21:40 schreef Ferry Toth:Hi, Op 30-04-2024 om 17:32 schreef Hardik Gajjar:On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:Hi, Op 25-04-2024 om 23:27 schreef Ferry Toth:Hi, Op 17-04-2024 om 17:13 schreef Hardik Gajjar:On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:Op 11-04-2024 om 18:39 schreef Andy Shevchenko:On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:Op 05-04-2024 om 13:38 schreef Hardik Gajjar:...Exactly. And this didn't happen before the 2 patches. To be precise: /sys/class/net/usb0 is not removed and it is a link, the linktarget /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0no longer existsSo, it means that the /sys/class/net/usb0 is present, but the symlink is broken. In that case, the dwc3 driver should recreate the device, and the symlink should become active againYes, on first enabling gadget (when device mode is activated): root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ driver net power sound subsystem suspended uevent Then switching to host mode: root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ ls: cannot access '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file or directory Then back to device mode: root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ driver power sound subsystem suspended uevent net is missing. But, network functions: root@yuna:~# ping 10.42.0.1 PING 10.42.0.1 (10.42.0.1): 56 data bytesMass storage device is created and removed each time as expected.So, what's the conclusion? Shall we move towards revert of thosetwo changes?As promised, I have the tested the this patch with the dwc3 gadget.I could not reproduce the issue.I can see the usb0 exist all the time and accessible regardless ofthe role switching of the USB mode (peripheral <-> host) Following are the logs: //Host to deviceconsole:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"modeconsole:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/ usb0 //device to host console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/ usb0That is weird. When I switch to host mode (using the physical switch), the whole gadget directory is removed (now testing 6.9.0-rc5) Switching back to device mode, that gadget directory is recreated. And gadget/sound as well, but not gadget/net.s a800000.dwc3/gadget/net/usb0 < addr_assign_type duplex phys_port_name addr_len flags phys_switch_id address gro_flush_timeout power broadcast ifalias proto_down carrier ifindex queues carrier_changes iflink speed carrier_down_count link_mode statistics carrier_up_count mtu subsystem dev_id name_assign_type tx_queue_len dev_port netdev_group type device operstate uevent dormant phys_port_id waiting_for_supplierconsole:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0TX packets:0 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000 RX bytes:0 TX bytes:0 console:/sys/bus/platform/devices/a800000.ssusb #I strongly advise against reverting the patch solely based on the observed issue of removing the /sys/class/net/usb0 directory whilethe usb0 interface remains available.There's more to it. I also mentioned that switching the role or unplugging the cable leaves the usb0 connection. I have while in host mode: root@yuna:~# ifconfig -a usb0usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500inet 10.42.0.221 netmask 255.255.255.0 broadcast 10.42.0.255 inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 scopeid 0x20<link> You don't see that because you didn't create a connection at all.Instead, I recommend enabling FTRACE to trace the functions involved and identify which faulty call is responsible for removing usb0.Switching from device -> host -> device: root@yuna:~# trace-cmd record -p function_graph -l *gether_* plugin 'function_graph' Hit Ctrl^C to stop recording ^CCPU0 data recorded at offset=0x1c8000 188 bytes in size (4096 uncompressed) CPU1 data recorded at offset=0x1c9000 0 bytes in size (0 uncompressed) root@yuna:~# trace-cmd report cpus=2 irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # 2079.480 us | gether_disconnect(); irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + 11.640 us | gether_disconnect(); irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! 116.520 us | gether_connect(); irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # 2057.260 us | gether_disconnect(); irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! 109.000 us | gether_connect();I tried to get a more useful trace: root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter root@yuna:/sys/kernel/tracing# echo function > current_tracer root@yuna:/sys/kernel/tracing# echo 'reset_config'set_ftrace_filter-> switch to host mode then back to device root@yuna:/sys/kernel/tracing# cat trace # tracer: function # # entries-in-buffer/entries-written: 53/53 #P:2 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | irq/68-dwc3-523 [000] D..3. 133.990254: reset_config <-__composite_disconnect irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable <-reset_config irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect <-reset_config kworker/1:3-443 [001] ...1. 134.022453: eem_unbind <-purge_configs_funcs -> to device mode kworker/1:3-443 [001] ...1. 148.630773: eem_bind <-usb_add_function irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt <-composite_setup irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect <-eem_set_altirq/68-dwc3-734 [000] D..3. 149.155220: gether_connect<-eem_set_alt irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt <-composite_setup irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect <-eem_set_altirq/68-dwc3-734 [000] D..3. 149.159338: gether_connect<-eem_set_alt irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap <-rx_complete ... I don't know where to look exactly. Any hints?do you see anything related to gether_cleanup() after eem_unbind() ?Nope. It's a pitty that the trace formatting got messed up above. But as you can see I traced gether_* and eem_*. After eem_unbind no traced function is called, until I flip the switch to device mode. The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete and eem_wrap <-eth_start_xmit lines.If not then, you may try to enable tracing of TCP/IP stack and network side to check who deleting the sysfs entryYes, that's a vast amount of functions to trace. And I don't see howthat would be related to the patch we're discussing here. I was hopingfor a little more targeted hint.Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*', 'reset_config' and 'device_remove_file' leads me to: TIMESTAMP FUNCTION | | 49.952477: eem_wrap <-eth_start_xmit 55.072455: eem_wrap <-eth_start_xmit 55.072621: eem_unwrap <-rx_complete 59.011540: configfs_composite_reset <-usb_gadget_udc_reset 59.011545: composite_reset <-configfs_composite_reset 59.011548: reset_config <-__composite_disconnect 59.013565: eem_disable <-reset_config 59.013567: gether_disconnect <-reset_config 59.049560: device_remove_file <-device_remove 59.051185: configfs_composite_disconnect <-usb_gadget_disco 59.051189: composite_disconnect <-configfs_composite_discon 59.051195: configfs_composite_unbind <-gadget_unbind_driver 59.052519: eem_unbind <-purge_configs_funcs 59.052529: composite_dev_cleanup <-configfs_composite_unbin 59.052537: device_remove_file <-composite_dev_cleanup device_remove_file gets called twice, once by device_remove aftergether_disconnect (that the one). The 2nd time by composite_dev_cleanup(removing the gadget)I believe that the device_remove_file function is only removing suspend-specific attributes, not the complete gadget. Typically, when you perform the role switch, the Gadget start/stop function in your UDC driver is called. These functions should not delete the gadget To investigate further, could you please enable the DWC3 functions in ftrace and check who is removing the gadget? I can also enable this on my system and compare the logs with yours, but I will be in PI planning for 1.5 weeks and may not be able to provide immediate support.Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so trace is 600kb. I cut uninteresting stuff before configfs_composite_reset <-usb_gadget_udc_reset and after__dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>compare.diff --git a/drivers/usb/gadget/function/f_eem.c b/drivers/usb/gadget/function/f_eem.cindex 3b445bd88498..c2a904475083 100644 --- a/drivers/usb/gadget/function/f_eem.c +++ b/drivers/usb/gadget/function/f_eem.c@@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)struct usb_composite_dev *cdev = c->cdev; struct f_eem *eem = func_to_eem(f); struct usb_string *us; - int status; + int status = 0; struct usb_ep *ep; struct f_eem_opts *eem_opts;@@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)* with list_for_each_entry, so we assume no race condition * with regard to eem_opts->bound access */ + mutex_lock(&eem_opts->lock); + gether_set_gadget(eem_opts->net, cdev->gadget); + if (!eem_opts->bound) { - mutex_lock(&eem_opts->lock); - gether_set_gadget(eem_opts->net, cdev->gadget); status = gether_register_netdev(eem_opts->net); - mutex_unlock(&eem_opts->lock); - if (status) - return status; - eem_opts->bound = true; } + mutex_unlock(&eem_opts->lock); + + if (status) + goto fail; + + eem_opts->bound = true; + us = usb_gstrings_attach(cdev, eem_strings, ARRAY_SIZE(eem_string_defs));After switching from host -> device -> host the usb0 device as seen by ifconfig remains "RUNNING" and in the routing table.FYI This is the regression.
For us, I don't see any advantages in having this patch and only the downside. Do you have any further ideas / patches that we could test.
Otherwise, it might be best to go with the original suggestion and revert until it gets sorted out?
Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is still missing.After deeper diving into v6.1.0, I found the latter a longer existing bug, not caused by your patch.More over, this doesn't seem to do any harm.The first issue does, as we try to route traffic over a device that no longer exists.Additionally, please check if you have any customized DWC patches that may be causing this problem.You may recall the whole issue did not occur before this patch got applied.HardikAccording to current kernel architecture of u_ether driver, only gether_cleanup should remove the usb0 interface along with its kobject and sysfs interface.I suggest sharing the analysis here to understand why this practiceis not followed in your use case or driver ?Yes, I'll try to trace where that happens. Nevertheless, the disappearance of the net/usb0 directory seemsharmless? But the usb: net device remaining after disconnect or roleswitch is not good, as the route remains.May be they are 2 separate problems. Could you try to reproduce whathappens if you make eem connection and then unplug?I am curious why the driver was developed without adhering to thekernel's gadget architecture.I don't know what you mean here. Which driver do you mean?I have the dwc3 IP base usb controller, Let me check with this patch and share result here. May be we need some fix in dwc3Would have been nice if someone could test on other controller as well. But another instance of dwc3 is also very welcome.It's quite possible, please test on your side. We are happy to test any fixes if you come up with.-- With Best Regards, Andy Shevchenko