On Thu, Nov 26, 2020 at 01:07:12PM -0600, George McCollister wrote: > On Thu, Nov 26, 2020 at 11:56 AM Vladimir Oltean <olteanv@xxxxxxxxx> wrote: > > > > On Thu, Nov 26, 2020 at 03:24:18PM +0200, Vladimir Oltean wrote: > > > On Wed, Nov 25, 2020 at 08:25:11PM -0600, George McCollister wrote: > > > > > > + {XRS_RX_UNDERSIZE_L, "rx_undersize"}, > > > > > > + {XRS_RX_FRAGMENTS_L, "rx_fragments"}, > > > > > > + {XRS_RX_OVERSIZE_L, "rx_oversize"}, > > > > > > + {XRS_RX_JABBER_L, "rx_jabber"}, > > > > > > + {XRS_RX_ERR_L, "rx_err"}, > > > > > > + {XRS_RX_CRC_L, "rx_crc"}, > > > > > > > > > > As Vladimir already mentioned to you the statistics which have > > > > > corresponding entries in struct rtnl_link_stats64 should be reported > > > > > the standard way. The infra for DSA may not be in place yet, so best > > > > > if you just drop those for now. > > > > > > > > Okay, that clears it up a bit. Just drop these 6? I'll read through > > > > that thread again and try to make sense of it. > > > > > > I feel that I should ask. Do you want me to look into exposing RMON > > > interface counters through rtnetlink (I've never done anything like that > > > before either, but there's a beginning for everything), or are you going > > > to? > > > > So I started to add .ndo_get_stats64 based on the hardware counters, but > > I already hit the first roadblock, as described by the wise words of > > Documentation/networking/statistics.rst: > > > > | The `.ndo_get_stats64` callback can not sleep because of accesses > > | via `/proc/net/dev`. If driver may sleep when retrieving the statistics > > | from the device it should do so periodically asynchronously and only return > > | a recent copy from `.ndo_get_stats64`. Ethtool interrupt coalescing interface > > | allows setting the frequency of refreshing statistics, if needed. > > > > > > Unfortunately, I feel this is almost unacceptable for a DSA driver that > > more often than not needs to retrieve these counters from a slow and > > bottlenecked bus (SPI, I2C, MDIO etc). Periodic readouts are not an > > option, because the only periodic interval that would not put absurdly > > high pressure on the limited SPI bandwidth would be a readout interval > > that gives you very old counters. > > Indeed it seems ndo_get_stats64() usually gets data over something > like a local or PCIe bus or from software. I had a brief look to see > if I could find another driver that was getting the stats over a slow > bus and didn't notice anything. If you haven't already you might do a > quick grep and see if anything pops out to you. > > > > > What exactly is it that incurs the atomic context? I cannot seem to > > figure out from this stack trace: > > I think something in fs/seq_file.c is taking an rcu lock. Not quite. It _is_ the RCU read-side lock that's taken, but it's taken locally from dev_seq_start in net/core/net-procfs.c. The reason is that /proc/net/dev iterates through all interfaces from the current netns, and it is precisely that that creates atomic context. You used to need to hold the rwlock_t dev_base_lock, but now you can also "get away" with the RCU read-side lock. Either way, both are atomic context, so it doesn't help. commit c6d14c84566d6b70ad9dc1618db0dec87cca9300 Author: Eric Dumazet <eric.dumazet@xxxxxxxxx> Date: Wed Nov 4 05:43:23 2009 -0800 net: Introduce for_each_netdev_rcu() iterator Adds RCU management to the list of netdevices. Convert some for_each_netdev() users to RCU version, if it can avoid read_lock-ing dev_base_lock Ie: read_lock(&dev_base_loack); for_each_netdev(net, dev) some_action(); read_unlock(&dev_base_lock); becomes : rcu_read_lock(); for_each_netdev_rcu(net, dev) some_action(); rcu_read_unlock(); Signed-off-by: Eric Dumazet <eric.dumazet@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> So... yeah. As long as this kernel interface exists, it needs to run in atomic context, by construction. Great. > I suppose it doesn't really matter though since the documentation says > we can't sleep. You're talking, I suppose, about these words of wisdom in Documentation/filesystems/seq_file.rst? | However, the seq_file code (by design) will not sleep between the calls | to start() and stop(), so holding a lock during that time is a | reasonable thing to do. The seq_file code will also avoid taking any | other locks while the iterator is active. It _doesn't_ say that you can't sleep between start() and stop(), right? It just says that if you want to keep the seq_file iterator atomic, the seq_file code is not sabotaging you by sleeping. But you still could sleep if you wanted to. Back to the statistics counters. How accurate do the counters in /proc/net/dev need to be? What programs consume those? Could they be more out of date than the ones retrieved through rtnetlink? I'm thinking that maybe we could introduce another ndo, something like .ndo_get_stats64_blocking, that could be called from all places except from net/core/net-procfs.c. That one could still call the non-blocking variant. Then, depending on the answer to the question "how inaccurate could we reasonably leave /proc/net/dev", we could: - just return zeroes there - return the counters cached from the last blocking call > It does seem to me that this is something that needs to be sorted out > at the subsystem level and that this driver has been "caught in the > crossfire". Any guidance on how we could proceed with this driver and > revisit this when we have answers to these questions at the subsystem > level would be appreciated if substantial time will be required to > work this out. Now seriously, who isn't caught in the crossfire here? Let's do some brainstorming and it will be quick and painless.