On Fri, Nov 26, 2021 at 9:09 AM Luis Goncalves <lgoncalv@xxxxxxxxxx> wrote: > > On Thu, Nov 25, 2021 at 8:11 AM Sebastian Andrzej Siewior > <bigeasy@xxxxxxxxxxxxx> wrote: > > > > On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote: > > > > > > It makes sense to only change that one invocation. Let me form a patch. > > > > So I made dis: > > > > | From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001 > > | From: "Luis Claudio R. Goncalves" <lgoncalv@xxxxxxxxxx> > > | Date: Thu, 25 Nov 2021 11:05:17 +0100 > > | Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in > > | netstat_show() > > | > > | The writer acquires dev_base_lock with disabled bottom halves. > > | The reader can acquire dev_base_lock without disabling bottom halves > > | because there is no writer in softirq context. Also readers can acquire > > | the lock in task and softirq context and multiple readers are allowed. > > | > > | On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts > > | as a lock to ensure that resources, that are protected by disabling > > | bottom halves, remain protected. > > | This leads to a circular locking dependency if the lock acquired with > > | disabled bottom halves (as in write_lock_bh()) and somewhere else with > > | enabled bottom halves (as by read_lock() in netstat_show()) followed by > > | disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout() > > | -> spin_lock_bh()). This is the reverse locking order. > > | > > | In order to avoid the reverse locking order, acquire the dev_base_lock > > | with disabled bottom halves. > > | > > | Reported-by: Pei Zhang <pezhang@xxxxxxxxxx> > > | Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@xxxxxxxxxx> > > | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > > > > and then I started locking for in-softirq reader or writer. I can't find > > any. Therefore I made this for testing: > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev) > > > > ASSERT_RTNL(); > > > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > list_add_tail_rcu(&dev->dev_list, &net->dev_base_head); > > netdev_name_node_add(net, dev->name_node); > > hlist_add_head_rcu(&dev->index_hlist, > > dev_index_hash(net, dev->ifindex)); > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > > > dev_base_seq_inc(net); > > } > > @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev) > > ASSERT_RTNL(); > > > > /* Unlink dev from the device chain */ > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > list_del_rcu(&dev->dev_list); > > netdev_name_node_del(dev->name_node); > > hlist_del_rcu(&dev->index_hlist); > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > > > dev_base_seq_inc(dev_net(dev)); > > } > > @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname) > > > > netdev_adjacent_rename_links(dev, oldname); > > > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > netdev_name_node_del(dev->name_node); > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > > > synchronize_rcu(); > > > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > netdev_name_node_add(net, dev->name_node); > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > > > ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev); > > ret = notifier_to_errno(ret); > > diff --git a/net/core/link_watch.c b/net/core/link_watch.c > > --- a/net/core/link_watch.c > > +++ b/net/core/link_watch.c > > @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev) > > if (operstate == dev->operstate) > > return; > > > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > > > switch(dev->link_mode) { > > case IF_LINK_MODE_TESTING: > > @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev) > > > > dev->operstate = operstate; > > > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > } > > > > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > > --- a/net/core/rtnetlink.c > > +++ b/net/core/rtnetlink.c > > @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition) > > } > > > > if (dev->operstate != operstate) { > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > dev->operstate = operstate; > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > netdev_state_change(dev); > > } > > } > > @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb, > > if (tb[IFLA_LINKMODE]) { > > unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]); > > > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > if (dev->link_mode ^ value) > > status |= DO_SETLINK_NOTIFY; > > dev->link_mode = value; > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > } > > > > if (tb[IFLA_VFINFO_LIST]) { > > diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c > > --- a/net/hsr/hsr_device.c > > +++ b/net/hsr/hsr_device.c > > @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev) > > > > static void __hsr_set_operstate(struct net_device *dev, int transition) > > { > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > if (dev->operstate != transition) { > > dev->operstate = transition; > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > netdev_state_change(dev); > > } else { > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > } > > } > > > > > > If this does not create any warnings (with or without RT) then I would > > suggest to remove that _bh suffix since there is no need for it. This > > should also fix the RT warning you see now. > > Agreed? > > Sounds like a plan! > > I will start testing within the hour and report soon. Using v5.15.3-rt21 + your patch, I rebooted multiple times the two boxes where I had observed the netstat_show() lockdep complaint before. The kernel booted fine every single time, no netstat_show() lockdep complaints at all. I only tested a kernel with PREEMPT_RT enabled. Do you want me to test the vanilla kernel with your patch? Luis