Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 25, 2021 at 8:11 AM Sebastian Andrzej Siewior
<bigeasy@xxxxxxxxxxxxx> wrote:
>
> On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote:
> >
> > It makes sense to only change that one invocation. Let me form a patch.
>
> So I made dis:
>
> | From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001
> | From: "Luis Claudio R. Goncalves" <lgoncalv@xxxxxxxxxx>
> | Date: Thu, 25 Nov 2021 11:05:17 +0100
> | Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in
> |  netstat_show()
> |
> | The writer acquires dev_base_lock with disabled bottom halves.
> | The reader can acquire dev_base_lock without disabling bottom halves
> | because there is no writer in softirq context. Also readers can acquire
> | the lock in task and softirq context and multiple readers are allowed.
> |
> | On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
> | as a lock to ensure that resources, that are protected by disabling
> | bottom halves, remain protected.
> | This leads to a circular locking dependency if the lock acquired with
> | disabled bottom halves (as in write_lock_bh()) and somewhere else with
> | enabled bottom halves (as by read_lock() in netstat_show()) followed by
> | disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
> | -> spin_lock_bh()). This is the reverse locking order.
> |
> | In order to avoid the reverse locking order, acquire the dev_base_lock
> | with disabled bottom halves.
> |
> | Reported-by: Pei Zhang <pezhang@xxxxxxxxxx>
> | Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@xxxxxxxxxx>
> | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
>
> and then I started locking for in-softirq reader or writer. I can't find
> any. Therefore I made this for testing:
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev)
>
>         ASSERT_RTNL();
>
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>         list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
>         netdev_name_node_add(net, dev->name_node);
>         hlist_add_head_rcu(&dev->index_hlist,
>                            dev_index_hash(net, dev->ifindex));
> -       write_unlock_bh(&dev_base_lock);
> +       write_unlock(&dev_base_lock);
>
>         dev_base_seq_inc(net);
>  }
> @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev)
>         ASSERT_RTNL();
>
>         /* Unlink dev from the device chain */
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>         list_del_rcu(&dev->dev_list);
>         netdev_name_node_del(dev->name_node);
>         hlist_del_rcu(&dev->index_hlist);
> -       write_unlock_bh(&dev_base_lock);
> +       write_unlock(&dev_base_lock);
>
>         dev_base_seq_inc(dev_net(dev));
>  }
> @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
>
>         netdev_adjacent_rename_links(dev, oldname);
>
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>         netdev_name_node_del(dev->name_node);
> -       write_unlock_bh(&dev_base_lock);
> +       write_unlock(&dev_base_lock);
>
>         synchronize_rcu();
>
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>         netdev_name_node_add(net, dev->name_node);
> -       write_unlock_bh(&dev_base_lock);
> +       write_unlock(&dev_base_lock);
>
>         ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
>         ret = notifier_to_errno(ret);
> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev)
>         if (operstate == dev->operstate)
>                 return;
>
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>
>         switch(dev->link_mode) {
>         case IF_LINK_MODE_TESTING:
> @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev)
>
>         dev->operstate = operstate;
>
> -       write_unlock_bh(&dev_base_lock);
> +       write_unlock(&dev_base_lock);
>  }
>
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
>         }
>
>         if (dev->operstate != operstate) {
> -               write_lock_bh(&dev_base_lock);
> +               write_lock(&dev_base_lock);
>                 dev->operstate = operstate;
> -               write_unlock_bh(&dev_base_lock);
> +               write_unlock(&dev_base_lock);
>                 netdev_state_change(dev);
>         }
>  }
> @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb,
>         if (tb[IFLA_LINKMODE]) {
>                 unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
>
> -               write_lock_bh(&dev_base_lock);
> +               write_lock(&dev_base_lock);
>                 if (dev->link_mode ^ value)
>                         status |= DO_SETLINK_NOTIFY;
>                 dev->link_mode = value;
> -               write_unlock_bh(&dev_base_lock);
> +               write_unlock(&dev_base_lock);
>         }
>
>         if (tb[IFLA_VFINFO_LIST]) {
> diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
> --- a/net/hsr/hsr_device.c
> +++ b/net/hsr/hsr_device.c
> @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev)
>
>  static void __hsr_set_operstate(struct net_device *dev, int transition)
>  {
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>         if (dev->operstate != transition) {
>                 dev->operstate = transition;
> -               write_unlock_bh(&dev_base_lock);
> +               write_unlock(&dev_base_lock);
>                 netdev_state_change(dev);
>         } else {
> -               write_unlock_bh(&dev_base_lock);
> +               write_unlock(&dev_base_lock);
>         }
>  }
>
>
> If this does not create any warnings (with or without RT) then I would
> suggest to remove that _bh suffix since there is no need for it. This
> should also fix the RT warning you see now.
> Agreed?

Sounds like a plan!

I will start testing within the hour and report soon.

Thanks,
Luis

> Sebastian
>




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux