Hi, On 18.02.2022 10:34, Sebastian Andrzej Siewior wrote: > On 2022-02-17 15:08:55 [+0100], Marek Szyprowski wrote: >>>> Marek, does this work for you? >>> Yes, this fixed the issue. Thanks! >>> >>> Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> >> I've just noticed that there is one more issue left to fix (the $subject >> patch is already applied) - this one comes from threaded irq (if I got >> the stack trace right): > netif_rx() did only set the matching softirq bit and not more. Based on > that I don't see why NOHZ shouldn't complain about a pending softirq > once the CPU goes idle. Therefore I think the change I made is good > since it uncovered that. > >> ------------[ cut here ]------------ >> WARNING: CPU: 0 PID: 147 at kernel/softirq.c:363 >> __local_bh_enable_ip+0xa8/0x1c0 > … >> CPU: 0 PID: 147 Comm: irq/150-dwc3 Not tainted 5.17.0-rc4-next-20220217+ >> #4557 >> Hardware name: Samsung TM2E board (DT) >> pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : __local_bh_enable_ip+0xa8/0x1c0 >> lr : netif_rx+0xa4/0x2c0 >> ... >> >> Call trace: >> __local_bh_enable_ip+0xa8/0x1c0 >> netif_rx+0xa4/0x2c0 >> rx_complete+0x214/0x250 >> usb_gadget_giveback_request+0x58/0x170 >> dwc3_gadget_giveback+0xe4/0x200 >> dwc3_gadget_endpoint_trbs_complete+0x100/0x388 >> dwc3_thread_interrupt+0x46c/0xe20 > So dwc3_thread_interrupt() disables interrupts here. Felipe dropped it > and then added it back in > e5f68b4a3e7b0 ("Revert "usb: dwc3: gadget: remove unnecessary _irqsave()"") > > I would suggest to revert it (the above commit) and fixing the lockdep > splat in the gadget driver and other. I don't see the g_ether warning > Felipe mentioned. It might come from f_ncm (since it uses a timer) or > something else in the network stack (that uses a timeout timer). > But not now. > As much as I hate it, I suggest: > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 183b90923f51b..a0c883f19a417 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -4160,9 +4160,11 @@ static irqreturn_t dwc3_thread_interrupt(int irq, void *_evt) > unsigned long flags; > irqreturn_t ret = IRQ_NONE; > > + local_bh_disable(); > spin_lock_irqsave(&dwc->lock, flags); > ret = dwc3_process_event_buf(evt); > spin_unlock_irqrestore(&dwc->lock, flags); > + local_bh_enable(); > > return ret; > } > > > In the long run I would drop that irqsave (along with bh_disable() since > netif_rx() covers that) and make sure the there is no lockdep warning > popping up. > > Marek, could you please give it a try? Yes. The above change fixes the issue. I've also tried to revert the mentioned commit e5f68b4a3e7b, but this gives me the following lockdep warning: IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready ======================================================== WARNING: possible irq lock inversion dependency detected 5.17.0-rc4-next-20220217+ #4563 Not tainted -------------------------------------------------------- swapper/0/0 just changed the state of lock: ffff000025548098 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x2f0/0x360 but this lock took another, SOFTIRQ-unsafe lock in the past: (&dev->lock#2){+.+.}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&dev->lock#2); local_irq_disable(); lock(_xmit_ETHER#2); lock(&dev->lock#2); <Interrupt> lock(_xmit_ETHER#2); *** DEADLOCK *** 5 locks held by swapper/0/0: #0: ffff80000a5f39e0 ((&ndev->rs_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x3b0 #1: ffff80000a695e00 (rcu_read_lock){....}-{1:2}, at: ndisc_send_skb+0x138/0x730 [ipv6] #2: ffff80000a696290 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x94/0xbe8 [ipv6] #3: ffff80000a696290 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x64/0xec8 #4: ffff0000246cd258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x9a4/0xec8 the shortest dependencies between 2nd lock and 1st lock: -> (&dev->lock#2){+.+.}-{2:2} { HARDIRQ-ON-W at: lock_acquire+0x120/0x3c8 _raw_spin_lock+0x58/0x78 gether_connect+0x118/0x1c0 ecm_set_alt+0xf4/0x170 composite_setup+0x8d0/0x1820 dwc3_ep0_delegate_req+0x40/0x68 dwc3_ep0_interrupt+0x5a4/0xf88 dwc3_thread_interrupt+0xb7c/0xe18 irq_thread_fn+0x28/0x98 irq_thread+0x184/0x238 kthread+0x100/0x120 ret_from_fork+0x10/0x20 SOFTIRQ-ON-W at: lock_acquire+0x120/0x3c8 _raw_spin_lock+0x58/0x78 gether_connect+0x118/0x1c0 ecm_set_alt+0xf4/0x170 composite_setup+0x8d0/0x1820 dwc3_ep0_delegate_req+0x40/0x68 dwc3_ep0_interrupt+0x5a4/0xf88 dwc3_thread_interrupt+0xb7c/0xe18 irq_thread_fn+0x28/0x98 irq_thread+0x184/0x238 kthread+0x100/0x120 ret_from_fork+0x10/0x20 INITIAL USE at: lock_acquire+0x120/0x3c8 _raw_spin_lock_irq+0x74/0xa0 eth_open+0x28/0x78 __dev_open+0x100/0x190 __dev_change_flags+0x16c/0x1b8 dev_change_flags+0x20/0x60 do_setlink+0x314/0xd28 __rtnl_newlink+0x418/0x798 rtnl_newlink+0x4c/0x78 rtnetlink_rcv_msg+0x2bc/0x4e0 netlink_rcv_skb+0xe8/0x130 rtnetlink_rcv+0x14/0x20 netlink_unicast+0x1d4/0x280 netlink_sendmsg+0x2cc/0x408 ____sys_sendmsg+0x258/0x290 ___sys_sendmsg+0x80/0xc0 __sys_sendmsg+0x60/0xb8 __arm64_sys_sendmsg+0x1c/0x28 invoke_syscall+0x40/0xf8 el0_svc_common.constprop.3+0x8c/0x120 do_el0_svc+0x20/0x98 el0_svc+0x48/0x100 el0t_64_sync_handler+0x8c/0xb0 el0t_64_sync+0x15c/0x160 } ... key at: [<ffff80000b5004b8>] __key.60862+0x0/0x10 ... acquired at: _raw_spin_lock_irqsave+0x78/0x148 eth_start_xmit+0x30/0x3a0 dev_hard_start_xmit+0x108/0x3d8 sch_direct_xmit+0xf4/0x360 __dev_queue_xmit+0xa10/0xec8 dev_queue_xmit+0x10/0x18 neigh_resolve_output+0x18c/0x290 ip6_finish_output2+0x20c/0xbe8 [ipv6] __ip6_finish_output+0x104/0x2b8 [ipv6] ip6_finish_output+0x30/0x108 [ipv6] ip6_output+0x80/0x360 [ipv6] mld_sendpack+0x570/0x5a0 [ipv6] mld_ifc_work+0x2a4/0x4a0 [ipv6] process_one_work+0x29c/0x6b0 worker_thread+0x48/0x420 kthread+0x100/0x120 ret_from_fork+0x10/0x20 -> (_xmit_ETHER#2){+.-.}-{2:2} { HARDIRQ-ON-W at: lock_acquire+0x120/0x3c8 _raw_spin_lock+0x58/0x78 sch_direct_xmit+0x2f0/0x360 __dev_queue_xmit+0xa10/0xec8 dev_queue_xmit+0x10/0x18 neigh_resolve_output+0x18c/0x290 ip6_finish_output2+0x20c/0xbe8 [ipv6] __ip6_finish_output+0x104/0x2b8 [ipv6] ip6_finish_output+0x30/0x108 [ipv6] ip6_output+0x80/0x360 [ipv6] mld_sendpack+0x570/0x5a0 [ipv6] mld_ifc_work+0x2a4/0x4a0 [ipv6] process_one_work+0x29c/0x6b0 worker_thread+0x48/0x420 kthread+0x100/0x120 ret_from_fork+0x10/0x20 IN-SOFTIRQ-W at: lock_acquire+0x120/0x3c8 _raw_spin_lock+0x58/0x78 sch_direct_xmit+0x2f0/0x360 __dev_queue_xmit+0xa10/0xec8 dev_queue_xmit+0x10/0x18 ip6_finish_output2+0x8bc/0xbe8 [ipv6] __ip6_finish_output+0x104/0x2b8 [ipv6] ip6_finish_output+0x30/0x108 [ipv6] ip6_output+0x80/0x360 [ipv6] ndisc_send_skb+0x464/0x730 [ipv6] ndisc_send_rs+0x58/0x138 [ipv6] addrconf_rs_timer+0x140/0x218 [ipv6] call_timer_fn+0xb4/0x3b0 run_timer_softirq+0x290/0x678 _stext+0x11c/0x5cc irq_exit_rcu+0x168/0x1a8 el1_interrupt+0x40/0x128 el1h_64_irq_handler+0x14/0x20 el1h_64_irq+0x64/0x68 arch_cpu_idle+0x14/0x20 default_idle_call+0x78/0x350 do_idle+0x1f0/0x280 cpu_startup_entry+0x24/0x80 rest_init+0x180/0x290 arch_call_rest_init+0xc/0x14 start_kernel+0x694/0x6cc __primary_switched+0xc0/0xc8 INITIAL USE at: lock_acquire+0x120/0x3c8 _raw_spin_lock+0x58/0x78 sch_direct_xmit+0x2f0/0x360 __dev_queue_xmit+0xa10/0xec8 dev_queue_xmit+0x10/0x18 neigh_resolve_output+0x18c/0x290 ip6_finish_output2+0x20c/0xbe8 [ipv6] __ip6_finish_output+0x104/0x2b8 [ipv6] ip6_finish_output+0x30/0x108 [ipv6] ip6_output+0x80/0x360 [ipv6] mld_sendpack+0x570/0x5a0 [ipv6] mld_ifc_work+0x2a4/0x4a0 [ipv6] process_one_work+0x29c/0x6b0 worker_thread+0x48/0x420 kthread+0x100/0x120 ret_from_fork+0x10/0x20 } ... key at: [<ffff80000b507ca8>] netdev_xmit_lock_key+0x10/0x390 ... acquired at: __lock_acquire+0x564/0x16f8 lock_acquire+0x120/0x3c8 _raw_spin_lock+0x58/0x78 sch_direct_xmit+0x2f0/0x360 __dev_queue_xmit+0xa10/0xec8 dev_queue_xmit+0x10/0x18 ip6_finish_output2+0x8bc/0xbe8 [ipv6] __ip6_finish_output+0x104/0x2b8 [ipv6] ip6_finish_output+0x30/0x108 [ipv6] ip6_output+0x80/0x360 [ipv6] ndisc_send_skb+0x464/0x730 [ipv6] ndisc_send_rs+0x58/0x138 [ipv6] addrconf_rs_timer+0x140/0x218 [ipv6] call_timer_fn+0xb4/0x3b0 run_timer_softirq+0x290/0x678 _stext+0x11c/0x5cc irq_exit_rcu+0x168/0x1a8 el1_interrupt+0x40/0x128 el1h_64_irq_handler+0x14/0x20 el1h_64_irq+0x64/0x68 arch_cpu_idle+0x14/0x20 default_idle_call+0x78/0x350 do_idle+0x1f0/0x280 cpu_startup_entry+0x24/0x80 rest_init+0x180/0x290 arch_call_rest_init+0xc/0x14 start_kernel+0x694/0x6cc __primary_switched+0xc0/0xc8 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc4-next-20220217+ #4563 Hardware name: Samsung TM2E board (DT) Call trace: dump_backtrace.part.6+0xd8/0xe8 show_stack+0x14/0x60 dump_stack_lvl+0x88/0xb0 dump_stack+0x14/0x2c print_irq_inversion_bug+0x194/0x1f8 mark_lock.part.52+0x29c/0x440 __lock_acquire+0x564/0x16f8 lock_acquire+0x120/0x3c8 _raw_spin_lock+0x58/0x78 sch_direct_xmit+0x2f0/0x360 __dev_queue_xmit+0xa10/0xec8 dev_queue_xmit+0x10/0x18 ip6_finish_output2+0x8bc/0xbe8 [ipv6] __ip6_finish_output+0x104/0x2b8 [ipv6] ip6_finish_output+0x30/0x108 [ipv6] ip6_output+0x80/0x360 [ipv6] ndisc_send_skb+0x464/0x730 [ipv6] ndisc_send_rs+0x58/0x138 [ipv6] addrconf_rs_timer+0x140/0x218 [ipv6] call_timer_fn+0xb4/0x3b0 run_timer_softirq+0x290/0x678 _stext+0x11c/0x5cc irq_exit_rcu+0x168/0x1a8 el1_interrupt+0x40/0x128 el1h_64_irq_handler+0x14/0x20 el1h_64_irq+0x64/0x68 arch_cpu_idle+0x14/0x20 default_idle_call+0x78/0x350 do_idle+0x1f0/0x280 cpu_startup_entry+0x24/0x80 rest_init+0x180/0x290 arch_call_rest_init+0xc/0x14 start_kernel+0x694/0x6cc __primary_switched+0xc0/0xc8 Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland