Efficient and correct time based bandwidth monitoring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,

I want to monitor bandwidth/ throughput (on a NAT-ing IPv4 router) in a sliding window of n minutes correctly. Just from the wiki or the docs some uncertainties remain.

Named counter could be a first approach:

table inet filter {

  counter accept_https {}

  tcp dport 443 counter name accept_https accept comment "accept https"
}

Current state is to be queried with nft list ruleset | grep counter. Such a counter will gather statistics from start of loading the ruleset until eternity. A delta analysis for a 15min window could be solved in a later stage by a little math of a scraping tool.

A set would serve a similar purpose. This example is already more sophisticated to distinguish (internal) IP(v4) addresses:

define private_net = 192.168.2.0/24

table inet nftmon {
        set ip4counters {
                type ipv4_addr
                size 65535
                flags dynamic
                counter
        }

        chain forward {
type filter hook postrouting priority filter + 1; policy accept;
                ip saddr $private_net add @ip4counters { ip saddr }
                ip daddr $private_net add @ip4counters { ip daddr }
        }
}

Querying it with:

nft list set inet nftmon ip4counters

is more straightforward listing only the relevant metrics.

I could further enhance this with flags timeout for the set and add a timeout of 15min in the add part of the rule filling the set:

ip saddr 192.168.1.0/24 add @ip4counters { ip saddr timeout 15m }

1. The first approach with a named counter and a diff logic in a later stage (scraping script, piece of code) moves load from nftables somewhere else. Is this recommended in comparison to the timeout-flagging of set-variant? Will a counter overflow and break subtraction from time to time? (Uptime is multiple months with sufficient traffic.)

2. For the 2nd approach I assume the single packet matching the rule will end up with a 15m timeout in the set. Thus no entry in the set is older than 15min. So the metrics from this set only span a 15min interval. Is this correct? Asked from a different point of view: when will garbage collection take place clearing the timed out values from the set?

3. The pure counter approach cannot be improved with a garbage collection configuration? This would create 24x4 15min-intervals when running every 15min. Scraping this in between garbage collection runs means missing bandwidth/ packets?


4. Is there a third more efficient/ cheaper approach to define a rule or rules to yield bandwidth/ througput metrics grouped by IP (or port or whatever the rule is made of) so that only the last n minutes are taken into consideration? (Precise to the minute.)

5. Querying conntrack would be later stage if bandwidth monitoring yields unusual activity. A counter or the set approach requires less ressources (CPU, memory). Is this correct?


Thanks in advance,

Benno



[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux