Bogus network data in /proc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I suspect the answer will be that it is what it is, but here's the deal. I have a tool I use for monitoring network traffic among other things - see http://collectl.sourceforge.net/ - and one of its benefits is that you can run it continuously as a daemon (similar to sar) and generate data in a format suitable for plotting. This means that you can automate your entire network monitoring infrastructure at fairly fine granularity, down to second if you like. Actually 1-second level monitoring will provide incorrect data on earlier kernels because the stats aren't updated on 1 second boundaries and you need to monitor at an interval of 0.9765 seconds, but that's a different story which is explained at http://collectl.sourceforge.net/NetworkStats.html

But more importantly, I've found that occasionally (not that often) there is bogus data reported from /proc/net/dev. While I don't have a lot of details on this it seems to only show up on 10G network interfaces. Look at the following samples taken at 1 second intervals:

eth0:135115809 1024897 0 0 0 0 0 9 135458926 910340 0 0 0 0 0 0 eth0:135118023 1024923 0 0 0 0 0 9 135460952 910363 0 0 0 0 0 0 eth0: 0 884620 0 0 0 0 0 909397 9687563 1049736 0 0 0 0 0 0 eth0:135121189 1024957 0 0 0 0 0 9 135464222 910400 0 0 0 0 0 0 eth0:135129565 1024995 0 0 0 0 0 9 135473687 910435 0 0 0 0 0 0

see the middle sample? When I look at the change between samples it generates a really big number since the difference is assumed to be caused a counter wrapping. However I've also seen examples where the incorrect numbers are non-zero. The problem is it's not always straightforward when there is bad data. For example if the original and bogus values are close enough it's not even clear there's a problem.

So the obvious question is, is there any way to prevent the bogus data from getting reported? If not, is there any way to set the values to something to indicate that the correct values can't be determined? Clearly this problem would be visible to any tool that looks at /proc. As for the counter update frequency, even though they now appear to be updated closer to a 1 second boundary it also means tools that can monitor at sub-second intervals will report incorrect data since the counters only change once a second.

-mark


-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux