Re: Efficient and correct time based bandwidth monitoring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 11 Oct 2023 18:17:42 +0200
Benno <b.ohnsorg@xxxxxxxxxx> wrote:

> Hi there,
> 
> I want to monitor bandwidth/ throughput (on a NAT-ing IPv4 router) in a 
> sliding window of n minutes correctly. Just from the wiki or the docs 
> some uncertainties remain.
> 
> Named counter could be a first approach:
> 
> table inet filter {
> 
>    counter accept_https {}
> 
>    tcp dport 443 counter name accept_https accept comment "accept https"
> }
> 
> Current state is to be queried with nft list ruleset | grep counter.

nft list counters inet would make more sense (--json is also supported).

> Such a counter will gather statistics from start of loading the ruleset 
> until eternity. A delta analysis for a 15min window could be solved in a 
> later stage by a little math of a scraping tool.
> 
> A set would serve a similar purpose. This example is already more 
> sophisticated to distinguish (internal) IP(v4) addresses:
> 
> define private_net = 192.168.2.0/24
> 
> table inet nftmon {
>          set ip4counters {
>                  type ipv4_addr
>                  size 65535
>                  flags dynamic
>                  counter
>          }
> 
>          chain forward {
>                  type filter hook postrouting priority filter + 1; 
> policy accept;
>                  ip saddr $private_net add @ip4counters { ip saddr }
>                  ip daddr $private_net add @ip4counters { ip daddr }
>          }
> }
> 
> Querying it with:
> 
> nft list set inet nftmon ip4counters
> 
> is more straightforward listing only the relevant metrics.

In the absence of a timeout, it is probable that the set will become full.

> 
> I could further enhance this with flags timeout for the set and add a 
> timeout of 15min in the add part of the rule filling the set:
> 
> ip saddr 192.168.1.0/24 add @ip4counters { ip saddr timeout 15m }
> 
> 1. The first approach with a named counter and a diff logic in a later 
> stage (scraping script, piece of code) moves load from nftables 
> somewhere else. Is this recommended in comparison to the 
> timeout-flagging of set-variant? Will a counter overflow and break 
> subtraction from time to time? (Uptime is multiple months with 
> sufficient traffic.)

The manual indicates that they are signed 64-bit integers, which is quite generous. However, I am uncertain as to how they wrap around. My guess would be that they go as far as 9223372036854775807 ((1 << 63) - 1) before wrapping around to 0, because wrapping around to a negative number would be confusing. I would appreciate a confirmation from a Netfilter developer, one way or the other. Once the wrapping behaviour is confirmed, detecting such should be straightforward.

> 
> 2. For the 2nd approach I assume the single packet matching the rule 
> will end up with a 15m timeout in the set. Thus no entry in the set is 
> older than 15min. So the metrics from this set only span a 15min 
> interval. Is this correct? Asked from a different point of view: when 
> will garbage collection take place clearing the timed out values from 
> the set?

Yes, this is correct. Once an element has timed out, it shall be unceremoniously removed. Further, the timeout value may also be defined by the set itself. A potential issue with this approach is that your data collector ends up racing with the exact time at which a given element is created and/or the exact time at which it is removed.

> 
> 3. The pure counter approach cannot be improved with a garbage 
> collection configuration? This would create 24x4 15min-intervals when 
> running every 15min. Scraping this in between garbage collection runs 
> means missing bandwidth/ packets?

I'm not sure that I understand what a garbage collection configuration would entail. Do you mean for your collector to reset the counters after collecting their values? If so, I would not expect for anything to be missed, provided that the reset command is issued via the same invocation of nft(8) that instructs it to print the counters.

  nft -j 'list counters inet; reset counters inet'

Having said that, I may have uncovered a bug in the course of trying this. I shall explore the matter further.

> 
> 
> 4. Is there a third more efficient/ cheaper approach to define a rule or 
> rules to yield bandwidth/ througput metrics grouped by IP (or port or 
> whatever the rule is made of) so that only the last n minutes are taken 
> into consideration? (Precise to the minute.)

Short of implementing a custom tool in userspace, I don't think so. Sets are particularly powerful, as they allow for elements to be a tuple of varying data types.

> 
> 5. Querying conntrack would be later stage if bandwidth monitoring 
> yields unusual activity. A counter or the set approach requires less 
> ressources (CPU, memory). Is this correct?

Based on my own experience, I would expect for conntrack(8) scraping to be less efficient.

-- 
Kerin Millar



[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux