On 9/26/24 02:03, Willem de Bruijn wrote:
greearb@ wrote:
From: Ben Greear <greearb@xxxxxxxxxxxxxxx>
This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.
dev_queue_xmit_nit needs to run with bh locking, otherwise
it conflicts with packets coming in from a nic in softirq
context and packets being transmitted from user context.
================================
WARNING: inconsistent lock state
6.11.0 #1 Tainted: G W
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
{IN-SOFTIRQ-W} state was registered at:
lock_acquire+0x19a/0x4f0
_raw_spin_lock+0x27/0x40
packet_rcv+0xa33/0x1320
__netif_receive_skb_core.constprop.0+0xcb0/0x3a90
__netif_receive_skb_list_core+0x2c9/0x890
netif_receive_skb_list_internal+0x610/0xcc0
napi_complete_done+0x1c0/0x7c0
igb_poll+0x1dbb/0x57e0 [igb]
__napi_poll.constprop.0+0x99/0x430
net_rx_action+0x8e7/0xe10
handle_softirqs+0x1b7/0x800
__irq_exit_rcu+0x91/0xc0
irq_exit_rcu+0x5/0x10
[snip]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(rlock-AF_PACKET);
<Interrupt>
lock(rlock-AF_PACKET);
*** DEADLOCK ***
Call Trace:
<TASK>
dump_stack_lvl+0x73/0xa0
mark_lock+0x102e/0x16b0
__lock_acquire+0x9ae/0x6170
lock_acquire+0x19a/0x4f0
_raw_spin_lock+0x27/0x40
tpacket_rcv+0x863/0x3b30
dev_queue_xmit_nit+0x709/0xa40
vrf_finish_direct+0x26e/0x340 [vrf]
vrf_l3_out+0x5f4/0xe80 [vrf]
__ip_local_out+0x51e/0x7a0
[snip]
Fixes: 504fc6f4f7f6 ("vrf: Remove unnecessary RCU-bh critical section")
Link: https://lore.kernel.org/netdev/05765015-f727-2f30-58da-2ad6fa7ea99f@xxxxxxxxxxxxxxx/T/
Signed-off-by: Ben Greear <greearb@xxxxxxxxxxxxxxx>
Please Cc: all previous reviewers and folks who participated in the
discussion. I entirely missed this. No need to add as Cc tags, just
--cc in git send-email will do.
---
v2: Edit patch description.
drivers/net/vrf.c | 2 ++
net/core/dev.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 4d8ccaf9a2b4..4087f72f0d2b 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -608,7 +608,9 @@ static void vrf_finish_direct(struct sk_buff *skb)
eth_zero_addr(eth->h_dest);
eth->h_proto = skb->protocol;
+ rcu_read_lock_bh();
dev_queue_xmit_nit(skb, vrf_dev);
+ rcu_read_unlock_bh();
skb_pull(skb, ETH_HLEN);
}
diff --git a/net/core/dev.c b/net/core/dev.c
index cd479f5f22f6..566e69a38eed 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2285,6 +2285,7 @@ EXPORT_SYMBOL_GPL(dev_nit_active);
/*
* Support routine. Sends outgoing frames to any network
* taps currently in use.
+ * BH must be disabled before calling this.
Unnecessary. Especially patches for stable should be minimal to
reduce the chance of conflicts.
I was asked to add this, and also asked to CC stable.
Thanks,
Ben
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com