On Wed, Sep 29, 2021 at 10:53:33AM -0700, Paul E. McKenney wrote: > On Wed, Sep 29, 2021 at 02:37:08AM +0300, Vladimir Oltean wrote: > > __dev_queue_xmit(), which is our caller, does run under rcu_read_lock_bh(), > > but in my foolishness I had thought this would be enough to make the > > access, lockdep complains that rcu_read_lock() is not held. > > Depending on exactly which primitive is complaining, you can inform > lockdep of your intentions. For example, you can change > rcu_dereference(p) to rcu_dereference_bh(p). Or you can change: > > list_for_each_entry_rcu(p, &lh, field) { > ... > > To: > > list_for_each_entry_rcu(p, &lh, field, rcu_read_lock_bh_held()) { > ... > > And hlist_for_each_entry_rcu() can also take that same optional > lockdep parameter. This is covered by my first option below, basically I don't want to bloat the Ethernet bridge driver too much with a single user, and that user being outside the bridge code itself, at that. I'm sure Nikolay and Roopa would agree :) The bridge xmit function - br_dev_xmit - takes the rcu_preempt rcu_read_lock() too. Although I don't think it is very high-overhead in any incarnation, I think it would be seen as a positive improvement if it could be removed from that path too (or would it?) > > Which it isn't - as it turns out, RCU preempt and RCU-bh are two > > different flavors, and although Paul McKenney has consolidated > > synchronize_rcu() to wait for both preempt as well as bh read-side > > critical sections [1], the reader-side API is different, the lockdep > > maps and keys are different. > > > > The bridge calls synchronize_rcu() in br_vlan_flush(), and this does > > wait for our TX fastpath reader of the br_vlan_group_rcu to complete > > even though it is in an rcu-bh read side section. So even though this is > > in premise safe, to lockdep this is a case of "who are you? I don't know > > you, you're suspicious". > > > > Side note, I still don't really understand the different RCU flavors. > > RCU BH was there to handle denial-of-service networking loads. > Changes over the years to RCU and to softirq have rendered it obsolete. > But rcu_read_lock_bh() still disables softirq for you. Thank you, I guess? :) > RCU Sched provides the original semantics. > > RCU Preempt, as the name suggests, allows RCU readers to be preempted. > Of course, if you are using rcu_read_lock_sched() or rcu_read_lock_bh(), > you are disabling preemption across the critical section. > > > For example, as far as I can see, the core network stack has never > > directly called synchronize_rcu_bh, not even once. Just the initial > > synchronize_kernel(), replaced later with the RCU preempt variant - > > synchronize_rcu(). Very very long story short, dev_queue_xmit has > > started calling this exact variant - rcu_read_lock_bh() - since [2], to > > make dev_deactivate properly wait for network interfaces with > > NETIF_F_LLTX to finish their dev_queue_xmit(). But that relied on an > > existing synchronize_rcu(), not synchronize_rcu_bh(). So does this mean > > that synchronize_net() never really waited for the rcu-bh critical > > section in dev_queue_xmit to finish? I've no idea. > > The pre-consolidation Linux kernel v4.16 has these calls to > synchronize_rcu_bh(): > > drivers/net/team/team.c team_port_disable_netpoll 1094 synchronize_rcu_bh(); > drivers/vhost/net.c vhost_net_release 1027 synchronize_rcu_bh(); > net/netfilter/ipset/ip_set_hash_gen.h mtype_resize 667 synchronize_rcu_bh(); > > But to your point, nothing in net/core. > > And for v4.16 kernels build with CONFIG_PREEMPT=y, there is no guarantee > that synchronize_rcu() will wait for a rcu_read_lock_bh() critical > section. A CPU in such a critical section could take a scheduling-clock > interrupt, notice that it was not in an rcu_read_lock() critical section, > and report a quiescent state, which could well end that grace period. I find it really hard to believe that commit d4828d85d188 ("[NET]: Prevent transmission after dev_deactivate") did not provide the guarantee it promised. It seems much more likely that I'm missing something, although I don't see what :) But the team driver, which I did notice, and which you've linked to above as well, did have a comment which suggested that yes, if you don't call synchronize_rcu_bh(), you don't really wait for __dev_queue_xmit() to finish. > But as you say, in more recent kernels, synchronize_rcu() will indeed > wait for rcu_read_lock_bh() critical sections. > > But please be very careful when backporting. No concerns with backporting, the code in question was added last month or so. > > So basically there are multiple options. > > > > First would be to duplicate br_vlan_get_pvid_rcu() into a new > > br_vlan_get_pvid_rcu_bh() to appease lockdep for the TX path case. But > > this function already has another brother, br_vlan_get_pvid(), which is > > protected by the update-side rtnl_mutex. We don't want to grow the > > family too big too, especially since br_vlan_get_pvid_rcu_bh() would not > > be a function used by the bridge at all, just exported by it and used by > > the DSA layer. > > > > The option of getting to the bottom of why does __dev_queue_xmit use > > rcu-bh, and splitting that into local_bh_disable + rcu_read_lock, as it > > was before [3], might be impractical. There have been 15 years of > > development since then, and there are lots of code paths that use > > rcu_dereference_bh() in the TX path. Plus, with the consolidation work > > done in [1], I'm not even sure what are the practical benefits of rcu-bh > > any longer, if the whole point was for synchronize_rcu() to wait for > > everything in sight - how can spammy softirqs like networking paint > > themselves red any longer, and how can certain RCU updaters not wait for > > them now, in order to avoid denial of service? It doesn't appear > > possible from the distance from which I'm looking at the problem. > > So the effort of converting __dev_queue_xmit from rcu-bh to rcu-preempt > > would only appear justified if it went together with the complete > > elimination of rcu-bh. Also, it would appear to be quite a strange and > > roundabout way to fix a "suspicious RCU usage" lockdep message. > > The thing to be very careful of is code that might be implicitly assuming > that it cannot be interrupted by a softirq handler. This assumption will > of course be violated by changing rcu_read_lock_bh() to rcu_read_lock(). > The resulting low-probability subtle breakage might be hard to find. Of course, the networking code would not change functionally with the removal of rcu_read_lock_bh(). So rcu_read_lock_bh() would be transformed into local_bh_disable() + rcu_read_lock(). I am still a bit unclear on the details, but there are reasons why we need softirqs disabled - we don't call hard_start_xmit just from the NET_TX softirq, that would be just too nice :) > > Last, it appears possible to just give lockdep what it wants, and hold > > an rcu-preempt read-side critical section when calling br_vlan_get_pvid_rcu > > from the TX path. In terms of lines of code and amount of thought needed > > it is certainly the easiest path forward, even though it incurs a small > > (negligible) performance overhead (and avoidable, at that). This is what > > this patch does, in lack of a deeper understanding of lockdep, RCU or > > the network transmission process. > > > > [1] https://lwn.net/Articles/777036/ > > [2] commit d4828d85d188 ("[NET]: Prevent transmission after dev_deactivate") > > [3] commit 43da55cbd54e ("[NET]: Do less atomic count changes in dev_queue_xmit.") > > > > Fixes: d82f8ab0d874 ("net: dsa: tag_dsa: offload the bridge forwarding process") > > Signed-off-by: Vladimir Oltean <vladimir.oltean@xxxxxxx> > > Of course, if no one really needs rcu_read_lock_bh() anymore, I would be > quite happy to simplify my life by getting rid of it. ;-) > > Thanx, Paul The basic idea that updaters could be resilient against softirq storms sounds great in principle. Although if I understand correctly, that went away with the consolidation. So again, it isn't that some resiliency wouldn't be nice, but I'm looking at the current code and I just don't see what the benefits of rcu_bh are. If the answer is as self-evident as a naive person like me thinks it is - i.e. rcu_read_lock_bh is just a glorified version of rcu_read_lock which also disables softirqs, just with different lockdep rules - then is the consolidation really complete? Couldn't the bh-disable readers be modified to just open-code the disabling of softirqs, and resolve the lockdep issues that ensue from having a separate flavor? > > --- > > net/dsa/tag_dsa.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c > > index 77d0ce89ab77..178464cd2bdb 100644 > > --- a/net/dsa/tag_dsa.c > > +++ b/net/dsa/tag_dsa.c > > @@ -150,10 +150,9 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev, > > * that's where the packets ingressed from. > > */ > > if (!br_vlan_enabled(br)) { > > - /* Safe because __dev_queue_xmit() runs under > > - * rcu_read_lock_bh() > > - */ > > + rcu_read_lock(); > > err = br_vlan_get_pvid_rcu(br, &pvid); > > + rcu_read_unlock(); > > if (err) > > return NULL; > > } > > -- > > 2.25.1 > >