On Sat, Nov 28, 2020 at 01:39:12AM +0100, Andrew Lunn wrote: > > This means, as far as I understand, 2 things: > > 1. call_netdevice_notifiers_info doesn't help, since our problem is the > > same > > 2. I think that holding the RTNL should also be a valid way to iterate > > through the net devices in the current netns, and doing just that > > could be the simplest way out. It certainly worked when I tried it. > > But those could also be famous last words... > > DSA makes the assumption it can block in a notifier change event. For > example, NETDEV_CHANGEUPPER calls dsa_slave_changeupper() which calls > into the driver. We have not seen any warnings about sleeping while > atomic. So maybe see how NETDEV_CHANGEUPPER is issued. Yes, this is in line with what I said. Even though we are pure readers, it is still sufficient (albeit not necessary) to hold rtnl in order to safely iterate through net->dev_base_head (or in this case, through its hashmaps per name and per ifindex). However, rtnl is the only one that offers sleepable context, since it is the only one that is a mutex. So the patch could simply look like this, no notifiers needed: -----------------------------[cut here]----------------------------- diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c index c714e6a9dad4..1429e8c066d8 100644 --- a/net/core/net-procfs.c +++ b/net/core/net-procfs.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/netdevice.h> #include <linux/proc_fs.h> +#include <linux/rtnetlink.h> #include <linux/seq_file.h> #include <net/wext.h> @@ -21,7 +22,7 @@ static inline struct net_device *dev_from_same_bucket(struct seq_file *seq, loff unsigned int count = 0, offset = get_offset(*pos); h = &net->dev_index_head[get_bucket(*pos)]; - hlist_for_each_entry_rcu(dev, h, index_hlist) { + hlist_for_each_entry(dev, h, index_hlist) { if (++count == offset) return dev; } @@ -51,9 +52,9 @@ static inline struct net_device *dev_from_bucket(struct seq_file *seq, loff_t *p * in detail. */ static void *dev_seq_start(struct seq_file *seq, loff_t *pos) - __acquires(RCU) + __acquires(rtnl_mutex) { - rcu_read_lock(); + rtnl_lock(); if (!*pos) return SEQ_START_TOKEN; @@ -70,9 +71,9 @@ static void *dev_seq_next(struct seq_file *seq, void *v, loff_t *pos) } static void dev_seq_stop(struct seq_file *seq, void *v) - __releases(RCU) + __releases(rtnl_mutex) { - rcu_read_unlock(); + rtnl_unlock(); } static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev) -----------------------------[cut here]----------------------------- The advantage is that it can now sleep. The disadvantage is that now it can sleep. It is a writer, so other writers can block it out, and it can block out other writers. More contention. Speaking of, here's an interesting patch from someone who seems to enjoy running ifconfig: commit f04565ddf52e401880f8ba51de0dff8ba51c99fd Author: Mihai Maruseac <mihai.maruseac@xxxxxxxxx> Date: Thu Oct 20 20:45:10 2011 +0000 dev: use name hash for dev_seq_ops Instead of using the dev->next chain and trying to resync at each call to dev_seq_start, use the name hash, keeping the bucket and the offset in seq->private field. Tests revealed the following results for ifconfig > /dev/null * 1000 interfaces: * 0.114s without patch * 0.089s with patch * 3000 interfaces: * 0.489s without patch * 0.110s with patch * 5000 interfaces: * 1.363s without patch * 0.250s with patch * 128000 interfaces (other setup): * ~100s without patch * ~30s with patch Signed-off-by: Mihai Maruseac <mmaruseac@xxxxxxxxxxx> Signed-off-by: Eric Dumazet <eric.dumazet@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> Jakub, I would like to hear more from you. I would still like to try this patch out. You clearly have a lot more background with the code. You said in an earlier reply that you should have also documented that ndo_get_stats64 is one of the few NDOs that does not take the RTNL. Is there a particular reason for that being so, and a reason why it can't change?