Re: [PATCH nf-next] netfilter: conntrack: allow to tune gc behavior

Karel Rericha <karel@xxxxxxxxx> · Wed, 24 Nov 2021 10:17:43 +0100

If I may add myself to discussion:

Florian's tunable solution is good enough for both worlds: setting
high gc interval will make idle VMs not waking up unnecessarily often
and low gc interval will prevent dropping conntrack events for systems
with busy very large conntrack tables which acquire conntrack events
through netlink.

I'm not sure about the default value though. Two minutes means
dropping events for some systems, i.e. breaking functionality compared
to previous gc solution. For VMs a few more waking ups don't break
anything I would guess (except for a little higher load). So a good
default would be returning back to hundreds of milliseconds or at
least to seconds. Two minutes are causing dropping conntrack events
even for 100MB netlink socket buffer here (several thousands events
per second, conntrack max 1M, hash table size 1M).

Karel

út 23. 11. 2021 v 15:01 odesílatel Eyal Birger <eyal.birger@xxxxxxxxx> napsal:
>
> Hi Pablo,
>
> On Tue, Nov 23, 2021 at 3:24 PM Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On Sun, Nov 21, 2021 at 06:05:14PM +0100, Florian Westphal wrote:
> > > as of commit 4608fdfc07e1
> > > ("netfilter: conntrack: collect all entries in one cycle")
> > > conntrack gc was changed to run periodically every 2 minutes.
> > >
> > > On systems where conntrack hash table is set to large value,
> > > almost all evictions happen from gc worker rather than the packet
> > > path due to hash table distribution.
> > >
> > > This causes netlink event overflows when the events are collected.
> >
> > If the issue is netlink, it should be possible to batch netlink
> > events.
> >
> > > This change exposes two sysctls:
> > >
> > > 1. gc interval (milliseconds, default: 2 minutes)
> > > 2. buckets per cycle (default: UINT_MAX / all)
> > >
> > > This allows to increase the scan intervals but also to reduce bustiness
> > > by switching to partial scans of the table for each cycle.
> >
> > Is there a way to apply autotuning? I know, this question might be
> > hard, but when does the user has update this new toggle? And do we
> > know what value should be placed here?
> >
> > @Eyal: What gc interval you selected for your setup to address this
> > issue? You mentioned a lot of UDP short-lived flows, correct?
>
> Yes, we have a lot of short lived UDP flows related to a DNS server service.
>
> We collect flow termination events using ulogd and forward them as JSON
> messages over UDP to fluentd. Since these flows are reaped every 2 minutes,
> we see spikes in UDP rx drops due to fluentd not keeping up with the bursts.
>
> We planned to configure this to run every 10s or so, which should be
> sufficient for our workloads, and monitor these spikes in order to tune
> further as needed.
>
> Eyal.