Please disregard, wrong version in $Subject. Thxs, Håkon > On 11 Sep 2021, at 15:58, Håkon Bugge <haakon.bugge@xxxxxxxxxx> wrote: > > From: Guillaume Nault <gnault@xxxxxxxxxx> > > __peernet2id() can be protected by RCU as it only calls idr_for_each(), > which is RCU-safe, and never modifies the nsid table. > > rtnl_net_dumpid() can also do lockless lookups. It does two nested > idr_for_each() calls on nsid tables (one direct call and one indirect > call because of rtnl_net_dumpid_one() calling __peernet2id()). The > netnsid tables are never updated. Therefore it is safe to not take the > nsid_lock and run within an RCU-critical section instead. > > Signed-off-by: Guillaume Nault <gnault@xxxxxxxxxx> > Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> > > A nice side-effect of replacing spin_{lock,unlock}_bh() with > rcu_spin_{lock,unlock}() in peernet2id() is that it avoids the > situation where SoftIRQs get enabled whilst IRQs are turned off. > > From bugzilla.redhat.com/show_bug.cgi?id=1384179 (an ancient > 4.9.0-0.rc0 kernel): > > dump_stack+0x86/0xc3 > __warn+0xcb/0xf0 > warn_slowpath_null+0x1d/0x20 > __local_bh_enable_ip+0x9d/0xc0 > _raw_spin_unlock_bh+0x35/0x40 > peernet2id+0x54/0x80 > netlink_broadcast_filtered+0x220/0x3c0 > netlink_broadcast+0x1d/0x20 > audit_log+0x6a/0x90 > security_set_bools+0xee/0x200 > [] > > Note, security_set_bools() calls write_lock_irq(). peernet2id() calls > spin_unlock_bh(). > > From an internal (UEK) stack trace based on the v4.14.35 kernel (LTS > 4.14.231): > > queued_spin_lock_slowpath+0xb/0xf > _raw_spin_lock_irqsave+0x46/0x48 > send_mad+0x3d2/0x590 [ib_core] > ib_sa_path_rec_get+0x223/0x4d0 [ib_core] > path_rec_start+0xa3/0x140 [ib_ipoib] > ipoib_start_xmit+0x2b0/0x6a0 [ib_ipoib] > dev_hard_start_xmit+0xb2/0x237 > sch_direct_xmit+0x114/0x1bf > __dev_queue_xmit+0x592/0x818 > dev_queue_xmit+0x10/0x12 > arp_xmit+0x38/0xa6 > arp_send_dst.part.16+0x61/0x84 > arp_process+0x825/0x889 > arp_rcv+0x140/0x1c9 > __netif_receive_skb_core+0x401/0xb39 > __netif_receive_skb+0x18/0x59 > netif_receive_skb_internal+0x45/0x119 > napi_gro_receive+0xd8/0xf6 > ipoib_ib_handle_rx_wc+0x1ca/0x520 [ib_ipoib] > ipoib_poll+0xcd/0x150 [ib_ipoib] > net_rx_action+0x289/0x3f4 > __do_softirq+0xe1/0x2b5 > do_softirq_own_stack+0x2a/0x35 > </IRQ> > do_softirq+0x4d/0x6a > __local_bh_enable_ip+0x57/0x59 > _raw_spin_unlock_bh+0x23/0x25 > peernet2id+0x51/0x73 > netlink_broadcast_filtered+0x223/0x41b > netlink_broadcast+0x1d/0x1f > rdma_nl_multicast+0x22/0x30 [ib_core] > send_mad+0x3e5/0x590 [ib_core] > ib_sa_path_rec_get+0x223/0x4d0 [ib_core] > rdma_resolve_route+0x287/0x810 [rdma_cm] > rds_rdma_cm_event_handler_cmn+0x311/0x7d0 [rds_rdma] > rds_rdma_cm_event_handler_worker+0x22/0x30 [rds_rdma] > process_one_work+0x169/0x3a6 > worker_thread+0x4d/0x3e5 > kthread+0x105/0x138 > ret_from_fork+0x24/0x49 > > Here, pay attention to ib_nl_make_request() which calls > spin_lock_irqsave() on a global lock just before calling > rdma_nl_multicast(). Thereafter, peernet2id() enables SoftIRQs, and > ipoib starts and calls the same path and ends up trying to acquire the > same global lock again. > > (cherry picked from commit 2dce224f469f060b9998a5a869151ef83c08ce77) > > Fixes: fba143c66abb ("netns: avoid disabling irq for netns id") > Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx> > > Conflicts: > net/core/net_namespace.c > > * rtnl_valid_dump_net_req() has a very minimal > implementation in 4.19, hence only a simple > substituting of spin_{lock,unlock}_bh() with > rcu_spin_{lock,unlock}() was required > --- > net/core/net_namespace.c | 18 ++++++++++-------- > 1 file changed, 10 insertions(+), 8 deletions(-) > > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c > index 939d8a3..26d70c0 100644 > --- a/net/core/net_namespace.c > +++ b/net/core/net_namespace.c > @@ -192,9 +192,9 @@ static int net_eq_idr(int id, void *net, void *peer) > return 0; > } > > -/* Should be called with nsid_lock held. If a new id is assigned, the bool alloc > - * is set to true, thus the caller knows that the new id must be notified via > - * rtnl. > +/* Must be called from RCU-critical section or with nsid_lock held. If > + * a new id is assigned, the bool alloc is set to true, thus the > + * caller knows that the new id must be notified via rtnl. > */ > static int __peernet2id_alloc(struct net *net, struct net *peer, bool *alloc) > { > @@ -218,7 +218,7 @@ static int __peernet2id_alloc(struct net *net, struct net *peer, bool *alloc) > return NETNSA_NSID_NOT_ASSIGNED; > } > > -/* should be called with nsid_lock held */ > +/* Must be called from RCU-critical section or with nsid_lock held */ > static int __peernet2id(struct net *net, struct net *peer) > { > bool no = false; > @@ -261,9 +261,10 @@ int peernet2id(struct net *net, struct net *peer) > { > int id; > > - spin_lock_bh(&net->nsid_lock); > + rcu_read_lock(); > id = __peernet2id(net, peer); > - spin_unlock_bh(&net->nsid_lock); > + rcu_read_unlock(); > + > return id; > } > EXPORT_SYMBOL(peernet2id); > @@ -837,6 +838,7 @@ struct rtnl_net_dump_cb { > int s_idx; > }; > > +/* Runs in RCU-critical section. */ > static int rtnl_net_dumpid_one(int id, void *peer, void *data) > { > struct rtnl_net_dump_cb *net_cb = (struct rtnl_net_dump_cb *)data; > @@ -867,9 +869,9 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct netlink_callback *cb) > .s_idx = cb->args[0], > }; > > - spin_lock_bh(&net->nsid_lock); > + rcu_read_lock(); > idr_for_each(&net->netns_ids, rtnl_net_dumpid_one, &net_cb); > - spin_unlock_bh(&net->nsid_lock); > + rcu_read_unlock(); > > cb->args[0] = net_cb.idx; > return skb->len; > -- > 1.8.3.1 >