Efficient and correct time based bandwidth monitoring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Hi there,

I want to monitor bandwidth/ throughput (on a NAT-ing IPv4 router) in a sliding window of n minutes correctly. Just from the wiki or the docs some uncertainties remain.
Named counter could be a first approach:

table inet filter {

  counter accept_https {}

  tcp dport 443 counter name accept_https accept comment "accept https"

Current state is to be queried with nft list ruleset | grep counter. Such a counter will gather statistics from start of loading the ruleset until eternity. A delta analysis for a 15min window could be solved in a later stage by a little math of a scraping tool.
A set would serve a similar purpose. This example is already more 
sophisticated to distinguish (internal) IP(v4) addresses:
define private_net =

table inet nftmon {
        set ip4counters {
                type ipv4_addr
                size 65535
                flags dynamic

        chain forward {
type filter hook postrouting priority filter + 1; policy accept;
                ip saddr $private_net add @ip4counters { ip saddr }
                ip daddr $private_net add @ip4counters { ip daddr }

Querying it with:

nft list set inet nftmon ip4counters

is more straightforward listing only the relevant metrics.

I could further enhance this with flags timeout for the set and add a timeout of 15min in the add part of the rule filling the set:
ip saddr add @ip4counters { ip saddr timeout 15m }

1. The first approach with a named counter and a diff logic in a later stage (scraping script, piece of code) moves load from nftables somewhere else. Is this recommended in comparison to the timeout-flagging of set-variant? Will a counter overflow and break subtraction from time to time? (Uptime is multiple months with sufficient traffic.)
2. For the 2nd approach I assume the single packet matching the rule 
will end up with a 15m timeout in the set. Thus no entry in the set is 
older than 15min. So the metrics from this set only span a 15min 
interval. Is this correct? Asked from a different point of view: when 
will garbage collection take place clearing the timed out values from 
the set?
3. The pure counter approach cannot be improved with a garbage 
collection configuration? This would create 24x4 15min-intervals when 
running every 15min. Scraping this in between garbage collection runs 
means missing bandwidth/ packets?

4. Is there a third more efficient/ cheaper approach to define a rule or rules to yield bandwidth/ througput metrics grouped by IP (or port or whatever the rule is made of) so that only the last n minutes are taken into consideration? (Precise to the minute.)
5. Querying conntrack would be later stage if bandwidth monitoring 
yields unusual activity. A counter or the set approach requires less 
ressources (CPU, memory). Is this correct?

Thanks in advance,


[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux