Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 26, 2021 at 9:09 AM Luis Goncalves <lgoncalv@xxxxxxxxxx> wrote:
>
> On Thu, Nov 25, 2021 at 8:11 AM Sebastian Andrzej Siewior
> <bigeasy@xxxxxxxxxxxxx> wrote:
> >
> > On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote:
> > >
> > > It makes sense to only change that one invocation. Let me form a patch.
> >
> > So I made dis:
> >
> > | From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001
> > | From: "Luis Claudio R. Goncalves" <lgoncalv@xxxxxxxxxx>
> > | Date: Thu, 25 Nov 2021 11:05:17 +0100
> > | Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in
> > |  netstat_show()
> > |
> > | The writer acquires dev_base_lock with disabled bottom halves.
> > | The reader can acquire dev_base_lock without disabling bottom halves
> > | because there is no writer in softirq context. Also readers can acquire
> > | the lock in task and softirq context and multiple readers are allowed.
> > |
> > | On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
> > | as a lock to ensure that resources, that are protected by disabling
> > | bottom halves, remain protected.
> > | This leads to a circular locking dependency if the lock acquired with
> > | disabled bottom halves (as in write_lock_bh()) and somewhere else with
> > | enabled bottom halves (as by read_lock() in netstat_show()) followed by
> > | disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
> > | -> spin_lock_bh()). This is the reverse locking order.
> > |
> > | In order to avoid the reverse locking order, acquire the dev_base_lock
> > | with disabled bottom halves.
> > |
> > | Reported-by: Pei Zhang <pezhang@xxxxxxxxxx>
> > | Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@xxxxxxxxxx>
> > | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> >
> > and then I started locking for in-softirq reader or writer. I can't find
> > any. Therefore I made this for testing:
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev)
> >
> >         ASSERT_RTNL();
> >
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >         list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
> >         netdev_name_node_add(net, dev->name_node);
> >         hlist_add_head_rcu(&dev->index_hlist,
> >                            dev_index_hash(net, dev->ifindex));
> > -       write_unlock_bh(&dev_base_lock);
> > +       write_unlock(&dev_base_lock);
> >
> >         dev_base_seq_inc(net);
> >  }
> > @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev)
> >         ASSERT_RTNL();
> >
> >         /* Unlink dev from the device chain */
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >         list_del_rcu(&dev->dev_list);
> >         netdev_name_node_del(dev->name_node);
> >         hlist_del_rcu(&dev->index_hlist);
> > -       write_unlock_bh(&dev_base_lock);
> > +       write_unlock(&dev_base_lock);
> >
> >         dev_base_seq_inc(dev_net(dev));
> >  }
> > @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
> >
> >         netdev_adjacent_rename_links(dev, oldname);
> >
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >         netdev_name_node_del(dev->name_node);
> > -       write_unlock_bh(&dev_base_lock);
> > +       write_unlock(&dev_base_lock);
> >
> >         synchronize_rcu();
> >
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >         netdev_name_node_add(net, dev->name_node);
> > -       write_unlock_bh(&dev_base_lock);
> > +       write_unlock(&dev_base_lock);
> >
> >         ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
> >         ret = notifier_to_errno(ret);
> > diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> > --- a/net/core/link_watch.c
> > +++ b/net/core/link_watch.c
> > @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev)
> >         if (operstate == dev->operstate)
> >                 return;
> >
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >
> >         switch(dev->link_mode) {
> >         case IF_LINK_MODE_TESTING:
> > @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev)
> >
> >         dev->operstate = operstate;
> >
> > -       write_unlock_bh(&dev_base_lock);
> > +       write_unlock(&dev_base_lock);
> >  }
> >
> >
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
> >         }
> >
> >         if (dev->operstate != operstate) {
> > -               write_lock_bh(&dev_base_lock);
> > +               write_lock(&dev_base_lock);
> >                 dev->operstate = operstate;
> > -               write_unlock_bh(&dev_base_lock);
> > +               write_unlock(&dev_base_lock);
> >                 netdev_state_change(dev);
> >         }
> >  }
> > @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb,
> >         if (tb[IFLA_LINKMODE]) {
> >                 unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
> >
> > -               write_lock_bh(&dev_base_lock);
> > +               write_lock(&dev_base_lock);
> >                 if (dev->link_mode ^ value)
> >                         status |= DO_SETLINK_NOTIFY;
> >                 dev->link_mode = value;
> > -               write_unlock_bh(&dev_base_lock);
> > +               write_unlock(&dev_base_lock);
> >         }
> >
> >         if (tb[IFLA_VFINFO_LIST]) {
> > diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
> > --- a/net/hsr/hsr_device.c
> > +++ b/net/hsr/hsr_device.c
> > @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev)
> >
> >  static void __hsr_set_operstate(struct net_device *dev, int transition)
> >  {
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >         if (dev->operstate != transition) {
> >                 dev->operstate = transition;
> > -               write_unlock_bh(&dev_base_lock);
> > +               write_unlock(&dev_base_lock);
> >                 netdev_state_change(dev);
> >         } else {
> > -               write_unlock_bh(&dev_base_lock);
> > +               write_unlock(&dev_base_lock);
> >         }
> >  }
> >
> >
> > If this does not create any warnings (with or without RT) then I would
> > suggest to remove that _bh suffix since there is no need for it. This
> > should also fix the RT warning you see now.
> > Agreed?
>
> Sounds like a plan!
>
> I will start testing within the hour and report soon.

Using v5.15.3-rt21 + your patch, I rebooted multiple times the two
boxes where I had observed the netstat_show() lockdep complaint
before. The kernel booted fine every single time,  no netstat_show()
lockdep complaints at all.

I only tested a kernel with PREEMPT_RT enabled. Do you want me to test
the vanilla kernel with your patch?

Luis




[Index of Archives]     [Linux USB Development]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux