On Sat, Nov 28, 2020 at 01:39:16AM +0200, Vladimir Oltean wrote: > On Sat, Nov 28, 2020 at 12:30:48AM +0100, Andrew Lunn wrote: > > > If there is a better alternative I'm all ears but having /proc and > > > ifconfig return zeros for error counts while ip link doesn't will lead > > > to too much confusion IMO. While delayed update of stats is a fact of > > > life for _years_ now (hence it was backed into the ethtool -C API). > > > > How about dev_seq_start() issues a netdev notifier chain event, asking > > devices which care to update their cached rtnl_link_stats64 counters. > > They can decide if their cache is too old, and do a blocking read for > > new values. > > > > Once the notifier has completed, dev_seq_start() can then > > rcu_read_lock() and do the actual collection of stats from the drivers > > non-blocking. > > That sounds smart. I can try to prototype that and see how well it > works, or do you want to? The situation is like this: static int call_netdevice_notifiers_info(unsigned long val, struct netdev_notifier_info *info); expects a non-NULL info->dev argument. To get a net device you need to call: #define for_each_netdev(net, d) \ list_for_each_entry(d, &(net)->dev_base_head, dev_list) which has the following protection rules: /* * The @dev_base_head list is protected by @dev_base_lock and the rtnl * semaphore. * * Pure readers hold dev_base_lock for reading, or rcu_read_lock() * * Writers must hold the rtnl semaphore while they loop through the * dev_base_head list, and hold dev_base_lock for writing when they do the * actual updates. This allows pure readers to access the list even * while a writer is preparing to update it. * * To put it another way, dev_base_lock is held for writing only to * protect against pure readers; the rtnl semaphore provides the * protection against other writers. * * See, for example usages, register_netdevice() and * unregister_netdevice(), which must be called with the rtnl * semaphore held. */ This means, as far as I understand, 2 things: 1. call_netdevice_notifiers_info doesn't help, since our problem is the same 2. I think that holding the RTNL should also be a valid way to iterate through the net devices in the current netns, and doing just that could be the simplest way out. It certainly worked when I tried it. But those could also be famous last words...