On Wed, 11 Oct 2023 18:17:42 +0200 Benno <b.ohnsorg@xxxxxxxxxx> wrote: > Hi there, > > I want to monitor bandwidth/ throughput (on a NAT-ing IPv4 router) in a > sliding window of n minutes correctly. Just from the wiki or the docs > some uncertainties remain. > > Named counter could be a first approach: > > table inet filter { > > counter accept_https {} > > tcp dport 443 counter name accept_https accept comment "accept https" > } > > Current state is to be queried with nft list ruleset | grep counter. nft list counters inet would make more sense (--json is also supported). > Such a counter will gather statistics from start of loading the ruleset > until eternity. A delta analysis for a 15min window could be solved in a > later stage by a little math of a scraping tool. > > A set would serve a similar purpose. This example is already more > sophisticated to distinguish (internal) IP(v4) addresses: > > define private_net = 192.168.2.0/24 > > table inet nftmon { > set ip4counters { > type ipv4_addr > size 65535 > flags dynamic > counter > } > > chain forward { > type filter hook postrouting priority filter + 1; > policy accept; > ip saddr $private_net add @ip4counters { ip saddr } > ip daddr $private_net add @ip4counters { ip daddr } > } > } > > Querying it with: > > nft list set inet nftmon ip4counters > > is more straightforward listing only the relevant metrics. In the absence of a timeout, it is probable that the set will become full. > > I could further enhance this with flags timeout for the set and add a > timeout of 15min in the add part of the rule filling the set: > > ip saddr 192.168.1.0/24 add @ip4counters { ip saddr timeout 15m } > > 1. The first approach with a named counter and a diff logic in a later > stage (scraping script, piece of code) moves load from nftables > somewhere else. Is this recommended in comparison to the > timeout-flagging of set-variant? Will a counter overflow and break > subtraction from time to time? (Uptime is multiple months with > sufficient traffic.) The manual indicates that they are signed 64-bit integers, which is quite generous. However, I am uncertain as to how they wrap around. My guess would be that they go as far as 9223372036854775807 ((1 << 63) - 1) before wrapping around to 0, because wrapping around to a negative number would be confusing. I would appreciate a confirmation from a Netfilter developer, one way or the other. Once the wrapping behaviour is confirmed, detecting such should be straightforward. > > 2. For the 2nd approach I assume the single packet matching the rule > will end up with a 15m timeout in the set. Thus no entry in the set is > older than 15min. So the metrics from this set only span a 15min > interval. Is this correct? Asked from a different point of view: when > will garbage collection take place clearing the timed out values from > the set? Yes, this is correct. Once an element has timed out, it shall be unceremoniously removed. Further, the timeout value may also be defined by the set itself. A potential issue with this approach is that your data collector ends up racing with the exact time at which a given element is created and/or the exact time at which it is removed. > > 3. The pure counter approach cannot be improved with a garbage > collection configuration? This would create 24x4 15min-intervals when > running every 15min. Scraping this in between garbage collection runs > means missing bandwidth/ packets? I'm not sure that I understand what a garbage collection configuration would entail. Do you mean for your collector to reset the counters after collecting their values? If so, I would not expect for anything to be missed, provided that the reset command is issued via the same invocation of nft(8) that instructs it to print the counters. nft -j 'list counters inet; reset counters inet' Having said that, I may have uncovered a bug in the course of trying this. I shall explore the matter further. > > > 4. Is there a third more efficient/ cheaper approach to define a rule or > rules to yield bandwidth/ througput metrics grouped by IP (or port or > whatever the rule is made of) so that only the last n minutes are taken > into consideration? (Precise to the minute.) Short of implementing a custom tool in userspace, I don't think so. Sets are particularly powerful, as they allow for elements to be a tuple of varying data types. > > 5. Querying conntrack would be later stage if bandwidth monitoring > yields unusual activity. A counter or the set approach requires less > ressources (CPU, memory). Is this correct? Based on my own experience, I would expect for conntrack(8) scraping to be less efficient. -- Kerin Millar