Hi there,
I want to monitor bandwidth/ throughput (on a NAT-ing IPv4 router) in a
sliding window of n minutes correctly. Just from the wiki or the docs
some uncertainties remain.
Named counter could be a first approach:
table inet filter {
counter accept_https {}
tcp dport 443 counter name accept_https accept comment "accept https"
}
Current state is to be queried with nft list ruleset | grep counter.
Such a counter will gather statistics from start of loading the ruleset
until eternity. A delta analysis for a 15min window could be solved in a
later stage by a little math of a scraping tool.
A set would serve a similar purpose. This example is already more
sophisticated to distinguish (internal) IP(v4) addresses:
define private_net = 192.168.2.0/24
table inet nftmon {
set ip4counters {
type ipv4_addr
size 65535
flags dynamic
counter
}
chain forward {
type filter hook postrouting priority filter + 1;
policy accept;
ip saddr $private_net add @ip4counters { ip saddr }
ip daddr $private_net add @ip4counters { ip daddr }
}
}
Querying it with:
nft list set inet nftmon ip4counters
is more straightforward listing only the relevant metrics.
I could further enhance this with flags timeout for the set and add a
timeout of 15min in the add part of the rule filling the set:
ip saddr 192.168.1.0/24 add @ip4counters { ip saddr timeout 15m }
1. The first approach with a named counter and a diff logic in a later
stage (scraping script, piece of code) moves load from nftables
somewhere else. Is this recommended in comparison to the
timeout-flagging of set-variant? Will a counter overflow and break
subtraction from time to time? (Uptime is multiple months with
sufficient traffic.)
2. For the 2nd approach I assume the single packet matching the rule
will end up with a 15m timeout in the set. Thus no entry in the set is
older than 15min. So the metrics from this set only span a 15min
interval. Is this correct? Asked from a different point of view: when
will garbage collection take place clearing the timed out values from
the set?
3. The pure counter approach cannot be improved with a garbage
collection configuration? This would create 24x4 15min-intervals when
running every 15min. Scraping this in between garbage collection runs
means missing bandwidth/ packets?
4. Is there a third more efficient/ cheaper approach to define a rule or
rules to yield bandwidth/ througput metrics grouped by IP (or port or
whatever the rule is made of) so that only the last n minutes are taken
into consideration? (Precise to the minute.)
5. Querying conntrack would be later stage if bandwidth monitoring
yields unusual activity. A counter or the set approach requires less
ressources (CPU, memory). Is this correct?
Thanks in advance,
Benno