On Tue, 2024-09-17 at 16:36 +0200, Sebastian Andrzej Siewior wrote: > On 2024-09-17 15:56:33 [+0200], Hubert Wiśniewski wrote: > > netif_rx() now disables bottom halves, which causes the USB gadget to be > > unable to receive frames if the interface is not brought up quickly enough > > after being created by the driver (a bug confirmed on AM3352 SoC). > > > > Replacing netif_rx() with __netif_rx() restores the old behavior and fixes > > the bug. This can be done since rx_callback() is called from the interrupt > > context. > > > > Fixes: baebdf48c360 ("net: dev: Makes sure netif_rx() can be invoked in any context.") > > Cc: stable@xxxxxxxxxxxxxxx > > Signed-off-by: Hubert Wiśniewski <hubert.wisniewski.25632@xxxxxxxxx> > > Now that I see v3, my v2 question stands. > This supposed only to disable BH if invoked from from non-interrupt. > hardirq and softirq should be good. A backtrace would be nice and further > explanation how this becomes a problem. Also lockdep should complain at > some point. > > Sebastian This lockup seems to occur only on AM335x SoCs. If the interface is brought up immediately after it is created: # modprobe g_ether; ip link set usb0 up it works fine. But if there is some delay: # modprobe g_ether; sleep 5; ip link set usb0 up the interface is unable to receive any frames from the USB host (but the outgoing frames are sent correctly). This becomes a problem when the g_ether module is loaded early in the boot process or it is built in, and the interface is configured later (e.g. by ifupdown). The same thing happens when using configfs to create and configure an Ethernet gadget. I have built the kernel with CONFIG_PROVE_LOCKING=y and got the following messages upon bringing the interface up: Backtrace (timestamps stripped): WARNING: CPU: 0 PID: 86 at kernel/softirq.c:362 __local_bh_enable_ip+0x118/0x198 CPU: 0 UID: 0 PID: 86 Comm: ip Not tainted 6.11.0 #3 Hardware name: Generic AM33XX (Flattened Device Tree) Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x88 dump_stack_lvl from __warn+0x70/0x1ac __warn from warn_slowpath_fmt+0x12c/0x1c4 warn_slowpath_fmt from __local_bh_enable_ip+0x118/0x198 __local_bh_enable_ip from netif_rx+0x118/0x208 netif_rx from rx_complete+0x140/0x27c rx_complete from musb_g_giveback+0xf0/0x1d8 musb_g_giveback from musb_ep_restart_resume_work+0x8/0x10 musb_ep_restart_resume_work from musb_gadget_queue+0x1c8/0x498 musb_gadget_queue from usb_ep_queue+0x38/0x134 usb_ep_queue from rx_submit+0xdc/0x1cc rx_submit from rx_fill+0x7c/0xa0 rx_fill from eth_start+0x30/0x54 eth_start from eth_open+0x40/0x80 eth_open from __dev_open+0xe0/0x174 __dev_open from __dev_change_flags+0x160/0x1d0 __dev_change_flags from dev_change_flags+0x1c/0x58 dev_change_flags from devinet_ioctl+0x6d0/0x884 devinet_ioctl from inet_ioctl+0x1ac/0x2bc inet_ioctl from sock_ioctl+0x2c0/0x3b8 sock_ioctl from sys_ioctl+0x194/0xf88 sys_ioctl from ret_fast_syscall+0x0/0x1c Exception stack(0xe1485fa8 to 0xe1485ff0) 5fa0: 000b3630 00000001 00000003 00008914 be802cd8 be802c90 5fc0: 000b3630 00000001 00000003 00000036 ffffffff ffffffff ffffffff be802ed4 5fe0: 000c74cc be802c80 00028000 b6e7d14c irq event stamp: 2581 hardirqs last enabled at (2579): [<c13e10cc>] _raw_spin_unlock_irqrestore+0x44/0x48 hardirqs last disabled at (2580): [<c13e0e94>] _raw_spin_lock_irqsave+0x64/0x68 softirqs last enabled at (2558): [<c0e727b0>] __dev_change_flags+0x7c/0x1d0 softirqs last disabled at (2581): [<c0e68dd8>] netif_rx+0xa4/0x208 Lockdep: ================================ WARNING: inconsistent lock state 6.11.0 #3 Tainted: G W -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. ip/86 [HC0[0]:SC0[0]:HE1:SE1] takes: c3e54050 (&musb->lock){?.-.}-{2:2}, at: musb_g_giveback+0xf8/0x1d8 {IN-HARDIRQ-W} state was registered at: lock_acquire+0x108/0x358 _raw_spin_lock_irqsave+0x4c/0x68 dsps_interrupt+0x28/0x278 __handle_irq_event_percpu+0xa0/0x2f4 handle_irq_event_percpu+0xc/0x40 handle_irq_event+0x38/0xcc handle_level_irq+0xb4/0x13c handle_irq_desc+0x1c/0x2c generic_handle_arch_irq+0x2c/0x64 call_with_stack+0x18/0x20 __irq_svc+0x9c/0xbc console_flush_all+0x25c/0x5c0 console_unlock+0x80/0x114 vprintk_emit+0x248/0x354 dev_vprintk_emit+0x110/0x144 dev_printk_emit+0x28/0x50 __dev_printk+0x74/0x90 _dev_warn+0x3c/0x68 _regulator_get+0x1f8/0x31c sdhci_omap_regulator_get_caps+0x8/0x84 sdhci_omap_probe+0x208/0x788 platform_probe+0x58/0xb8 really_probe+0xc4/0x28c __driver_probe_device+0x84/0x194 driver_probe_device+0x30/0xc8 __device_attach_driver+0xa4/0xe0 bus_for_each_drv+0x80/0xd0 __device_attach_async_helper+0xa8/0xdc async_run_entry_fn+0x20/0xb4 process_scheduled_works+0x254/0x6e8 worker_thread+0x13c/0x340 kthread+0xf4/0x114 ret_from_fork+0x14/0x24 irq event stamp: 2649 hardirqs last enabled at (2649): [<c13e10cc>] _raw_spin_unlock_irqrestore+0x44/0x48 hardirqs last disabled at (2648): [<c13e0e94>] _raw_spin_lock_irqsave+0x64/0x68 softirqs last enabled at (2644): [<c014b410>] handle_softirqs+0x260/0x45c softirqs last disabled at (2613): [<c1385f28>] call_with_stack+0x18/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&musb->lock); <Interrupt> lock(&musb->lock); *** DEADLOCK *** 1 lock held by ip/86: #0: c1d60ca4 (rtnl_mutex){+.+.}-{3:3}, at: devinet_ioctl+0xc4/0x884 stack backtrace: CPU: 0 UID: 0 PID: 86 Comm: ip Tainted: G W 6.11.0 #3 Tainted: [W]=WARN Hardware name: Generic AM33XX (Flattened Device Tree) Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x88 dump_stack_lvl from mark_lock.part.17+0x3cc/0x440 mark_lock.part.17 from __lock_acquire+0x44c/0x22b0 __lock_acquire from lock_acquire+0x108/0x358 lock_acquire from _raw_spin_lock+0x38/0x48 _raw_spin_lock from musb_g_giveback+0xf8/0x1d8 musb_g_giveback from musb_ep_restart_resume_work+0x8/0x10 musb_ep_restart_resume_work from musb_gadget_queue+0x1c8/0x498 musb_gadget_queue from usb_ep_queue+0x38/0x134 usb_ep_queue from rx_submit+0xdc/0x1cc rx_submit from rx_fill+0x7c/0xa0 rx_fill from eth_start+0x30/0x54 eth_start from eth_open+0x40/0x80 eth_open from __dev_open+0xe0/0x174 __dev_open from __dev_change_flags+0x160/0x1d0 __dev_change_flags from dev_change_flags+0x1c/0x58 dev_change_flags from devinet_ioctl+0x6d0/0x884 devinet_ioctl from inet_ioctl+0x1ac/0x2bc inet_ioctl from sock_ioctl+0x2c0/0x3b8 sock_ioctl from sys_ioctl+0x194/0xf88 sys_ioctl from ret_fast_syscall+0x0/0x1c Exception stack(0xe1485fa8 to 0xe1485ff0) 5fa0: 000b3630 00000001 00000003 00008914 be802cd8 be802c90 5fc0: 000b3630 00000001 00000003 00000036 ffffffff ffffffff ffffffff be802ed4 5fe0: 000c74cc be802c80 00028000 b6e7d14c Surprisingly, with the patch I suggested, there is still a backtrace generated, but without lockdep warning: WARNING: CPU: 0 PID: 87 at net/core/dev.c:5207 __netif_rx+0xd4/0x210 CPU: 0 UID: 0 PID: 87 Comm: ip Not tainted 6.11.0-dirty #4 Hardware name: Generic AM33XX (Flattened Device Tree) Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x88 dump_stack_lvl from __warn+0x70/0x1ac __warn from warn_slowpath_fmt+0x12c/0x1c4 warn_slowpath_fmt from __netif_rx+0xd4/0x210 __netif_rx from rx_complete+0x140/0x27c rx_complete from musb_g_giveback+0xf0/0x1d8 musb_g_giveback from musb_ep_restart_resume_work+0x8/0x10 musb_ep_restart_resume_work from musb_gadget_queue+0x1c8/0x498 musb_gadget_queue from usb_ep_queue+0x38/0x134 usb_ep_queue from rx_submit+0xdc/0x1cc rx_submit from rx_fill+0x7c/0xa0 rx_fill from eth_start+0x30/0x54 eth_start from eth_open+0x40/0x80 eth_open from __dev_open+0xe0/0x174 __dev_open from __dev_change_flags+0x160/0x1d0 __dev_change_flags from dev_change_flags+0x1c/0x58 dev_change_flags from devinet_ioctl+0x6d0/0x884 devinet_ioctl from inet_ioctl+0x1ac/0x2bc inet_ioctl from sock_ioctl+0x2c0/0x3b8 sock_ioctl from sys_ioctl+0x194/0xf88 sys_ioctl from ret_fast_syscall+0x0/0x1c Exception stack(0xe148dfa8 to 0xe148dff0) dfa0: 000b3630 00000001 00000003 00008914 be8c8cd8 be8c8c90 dfc0: 000b3630 00000001 00000003 00000036 ffffffff ffffffff ffffffff be8c8ed4 dfe0: 000c74cc be8c8c80 00028000 b6ed614c irq event stamp: 2542 hardirqs last enabled at (2541): [<c13e10cc>] _raw_spin_unlock_irqrestore+0x44/0x48 hardirqs last disabled at (2542): [<c13e0e94>] _raw_spin_lock_irqsave+0x64/0x68 softirqs last enabled at (2520): [<c0e727b0>] __dev_change_flags+0x7c/0x1d0 softirqs last disabled at (2518): [<c0e7236c>] dev_set_rx_mode+0x0/0x40 I'm a bit at loss here. The deadlock seems to be unrelated to netif_rx() (which is not being called in the interrupt context after all), yet replacing it with __netif_rx() fixes the lockup (though a warning is still generated, which suggests that the patch does not completely fix the issue). -- Hubert Wiśniewski <hubert.wisniewski.25632@xxxxxxxxx>