On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote: > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: > > Op 30-04-2024 om 21:40 schreef Ferry Toth: > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar: > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth: > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar: > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: > > > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: ... > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches. > > > > > > > > > > > > > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not > > > > > > > > > > > > > removed and it is a link, the link > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 > > > > > > > > > > > > > no > > > > > > > > > > > > > longer exists > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is > > > > > > > > > > > present, but the symlink is > > > > > > > > > > > broken. In that case, the dwc3 driver should > > > > > > > > > > > recreate the device, and the > > > > > > > > > > > symlink should become active again > > > > > > > > > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated): > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > driver net power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > Then switching to host mode: > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > ls: cannot access > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': > > > > > > > > > No such file > > > > > > > > > or directory > > > > > > > > > > > > > > > > > > Then back to device mode: > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > driver power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > net is missing. But, network functions: > > > > > > > > > > > > > > > > > > root@yuna:~# ping 10.42.0.1 > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes > > > > > > > > > > > > > > > > > > Mass storage device is created and removed each time as expected. > > > > > > > > > > > > > > > > So, what's the conclusion? Shall we move towards revert of those > > > > > > > > two changes? > > > > > > > > > > > > > > > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget. > > > > > > > I could not reproduce > > > > > > > the issue. > > > > > > > > > > > > > > I can see the usb0 exist all the time and accessible regardless of > > > > > > > the role switching of the USB mode (peripheral <-> host) > > > > > > > > > > > > > > Following are the logs: > > > > > > > //Host to device > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > > > > > > > > mode > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > usb0 > > > > > > > > > > > > > > //device to host > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > usb0 > > > > > > > > > > > > That is weird. When I switch to host mode (using the physical switch), > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5) > > > > > > > > > > > > Switching back to device mode, that gadget directory is recreated. And > > > > > > gadget/sound as well, but not gadget/net. > > > > > > > > > > > > > s a800000.dwc3/gadget/net/usb0 > > > > > > > < > > > > > > > addr_assign_type duplex phys_port_name > > > > > > > addr_len flags phys_switch_id > > > > > > > address gro_flush_timeout power > > > > > > > broadcast ifalias proto_down > > > > > > > carrier ifindex queues > > > > > > > carrier_changes iflink speed > > > > > > > carrier_down_count link_mode statistics > > > > > > > carrier_up_count mtu subsystem > > > > > > > dev_id name_assign_type tx_queue_len > > > > > > > dev_port netdev_group type > > > > > > > device operstate uevent > > > > > > > dormant phys_port_id waiting_for_supplier > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 > > > > > > > usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a > > > > > > > BROADCAST MULTICAST MTU:1500 Metric:1 > > > > > > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > > > > > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > > > > > > > collisions:0 txqueuelen:1000 > > > > > > > RX bytes:0 TX bytes:0 > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # > > > > > > > > > > > > > > I strongly advise against reverting the patch solely based on the > > > > > > > observed issue of removing the /sys/class/net/usb0 directory while > > > > > > > the usb0 interface remains available. > > > > > > > > > > > > There's more to it. I also mentioned that switching the role or > > > > > > unplugging the cable leaves the usb0 connection. > > > > > > > > > > > > I have while in host mode: > > > > > > root@yuna:~# ifconfig -a usb0 > > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 > > > > > > inet 10.42.0.221 netmask 255.255.255.0 broadcast > > > > > > 10.42.0.255 > > > > > > inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 > > > > > > scopeid 0x20<link> > > > > > > > > > > > > > > > > > > You don't see that because you didn't create a connection at all. > > > > > > > > > > > > > Instead, I recommend enabling FTRACE to trace the functions involved > > > > > > > and identify which faulty call is responsible for removing usb0. > > > > > > > > > > > > Switching from device -> host -> device: > > > > > > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_* > > > > > > plugin 'function_graph' > > > > > > Hit Ctrl^C to stop recording > > > > > > ^CCPU0 data recorded at offset=0x1c8000 > > > > > > 188 bytes in size (4096 uncompressed) > > > > > > CPU1 data recorded at offset=0x1c9000 > > > > > > 0 bytes in size (0 uncompressed) > > > > > > root@yuna:~# trace-cmd report > > > > > > cpus=2 > > > > > > irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # > > > > > > 2079.480 us | gether_disconnect(); > > > > > > irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + > > > > > > 11.640 us | gether_disconnect(); > > > > > > irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! > > > > > > 116.520 us | gether_connect(); > > > > > > irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # > > > > > > 2057.260 us | gether_disconnect(); > > > > > > irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! > > > > > > 109.000 us | gether_connect(); > > > > > > > > > > I tried to get a more useful trace: > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter > > > > > -> switch to host mode then back to device > > > > > root@yuna:/sys/kernel/tracing# cat trace > > > > > # tracer: function > > > > > # > > > > > # entries-in-buffer/entries-written: 53/53 #P:2 > > > > > # > > > > > # _-----=> irqs-off/BH-disabled > > > > > # / _----=> need-resched > > > > > # | / _---=> hardirq/softirq > > > > > # || / _--=> preempt-depth > > > > > # ||| / _-=> migrate-disable > > > > > # |||| / delay > > > > > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > > > > > # | | | ||||| | | > > > > > irq/68-dwc3-523 [000] D..3. 133.990254: reset_config > > > > > <-__composite_disconnect > > > > > irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable > > > > > <-reset_config > > > > > irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect > > > > > <-reset_config > > > > > kworker/1:3-443 [001] ...1. 134.022453: eem_unbind > > > > > <-purge_configs_funcs > > > > > > > > > > -> to device mode > > > > > > > > > > kworker/1:3-443 [001] ...1. 148.630773: eem_bind > > > > > <-usb_add_function > > > > > irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt > > > > > <-composite_setup > > > > > irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect > > > > > <-eem_set_alt > > > > > irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect > > > > > <-eem_set_alt > > > > > irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt > > > > > <-composite_setup > > > > > irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect > > > > > <-eem_set_alt > > > > > irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect > > > > > <-eem_set_alt > > > > > irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap > > > > > <-rx_complete > > > > > ... > > > > > > > > > > I don't know where to look exactly. Any hints? > > > > > > > > do you see anything related to gether_cleanup() after eem_unbind() ? > > > > > > Nope. It's a pitty that the trace formatting got messed up above. But as > > > you can see I traced gether_* and eem_*. After eem_unbind no traced > > > function is called, until I flip the switch to device mode. > > > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete > > > and eem_wrap <-eth_start_xmit lines. > > > > > > > If not then, you may try to enable tracing of TCP/IP stack and > > > > network side to check who deleting the sysfs entry > > > > > > Yes, that's a vast amount of functions to trace. And I don't see how > > > that would be related to the patch we're discussing here. I was hoping > > > for a little more targeted hint. > > > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*', > > 'reset_config' and 'device_remove_file' leads me to: > > > > TIMESTAMP FUNCTION > > | | > > 49.952477: eem_wrap <-eth_start_xmit > > 55.072455: eem_wrap <-eth_start_xmit > > 55.072621: eem_unwrap <-rx_complete > > 59.011540: configfs_composite_reset <-usb_gadget_udc_reset > > 59.011545: composite_reset <-configfs_composite_reset > > 59.011548: reset_config <-__composite_disconnect > > 59.013565: eem_disable <-reset_config > > 59.013567: gether_disconnect <-reset_config > > 59.049560: device_remove_file <-device_remove > > 59.051185: configfs_composite_disconnect <-usb_gadget_disco > > 59.051189: composite_disconnect <-configfs_composite_discon > > 59.051195: configfs_composite_unbind <-gadget_unbind_driver > > 59.052519: eem_unbind <-purge_configs_funcs > > 59.052529: composite_dev_cleanup <-configfs_composite_unbin > > 59.052537: device_remove_file <-composite_dev_cleanup > > > > device_remove_file gets called twice, once by device_remove after > > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup > > (removing the gadget) > > I believe that the device_remove_file function is only removing > suspend-specific attributes, not the complete gadget. Typically, when you > perform the role switch, the Gadget start/stop function in your UDC driver is > called. These functions should not delete the gadget > > To investigate further, could you please enable the DWC3 functions in ftrace > and check who is removing the gadget? I can also enable this on my system > and compare the logs with yours, but I will be in PI planning for 1.5 weeks > and may not be able to provide immediate support. Since we are almost at -rc7, I propose to revert and try again next cycle. > Additionally, please check if you have any customized DWC patches that may be causing this problem. > > > > You may recall the whole issue did not occur before this patch got applied. > > > > > > > > > > According to current kernel architecture of u_ether driver, only > > > > > > > gether_cleanup should remove the usb0 interface along with its > > > > > > > kobject and sysfs interface. > > > > > > > I suggest sharing the analysis here to understand why this practice > > > > > > > is not followed in your use case or driver ? > > > > > > > > > > > > Yes, I'll try to trace where that happens. > > > > > > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems > > > > > > harmless? But the usb: net device remaining after disconnect or role > > > > > > switch is not good, as the route remains. > > > > > > > > > > > > May be they are 2 separate problems. Could you try to reproduce what > > > > > > happens if you make eem connection and then unplug? > > > > > > > > > > > > > I am curious why the driver was developed without adhering to the > > > > > > > kernel's gadget architecture. > > > > > > > > > > I don't know what you mean here. Which driver do you mean? > > > > > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check > > > > > > > > > > > with this patch and > > > > > > > > > > > share result here. May be we need some fix in dwc3 > > > > > > > > > Would have been nice if someone could test on other > > > > > > > > > controller as well. But > > > > > > > > > another instance of dwc3 is also very welcome. > > > > > > > > > > It's quite possible, please test on your side. > > > > > > > > > > We are happy to test any fixes if you come up with. -- With Best Regards, Andy Shevchenko