If I may add myself to discussion: Florian's tunable solution is good enough for both worlds: setting high gc interval will make idle VMs not waking up unnecessarily often and low gc interval will prevent dropping conntrack events for systems with busy very large conntrack tables which acquire conntrack events through netlink. I'm not sure about the default value though. Two minutes means dropping events for some systems, i.e. breaking functionality compared to previous gc solution. For VMs a few more waking ups don't break anything I would guess (except for a little higher load). So a good default would be returning back to hundreds of milliseconds or at least to seconds. Two minutes are causing dropping conntrack events even for 100MB netlink socket buffer here (several thousands events per second, conntrack max 1M, hash table size 1M). Karel út 23. 11. 2021 v 15:01 odesílatel Eyal Birger <eyal.birger@xxxxxxxxx> napsal: > > Hi Pablo, > > On Tue, Nov 23, 2021 at 3:24 PM Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > > > Hi, > > > > On Sun, Nov 21, 2021 at 06:05:14PM +0100, Florian Westphal wrote: > > > as of commit 4608fdfc07e1 > > > ("netfilter: conntrack: collect all entries in one cycle") > > > conntrack gc was changed to run periodically every 2 minutes. > > > > > > On systems where conntrack hash table is set to large value, > > > almost all evictions happen from gc worker rather than the packet > > > path due to hash table distribution. > > > > > > This causes netlink event overflows when the events are collected. > > > > If the issue is netlink, it should be possible to batch netlink > > events. > > > > > This change exposes two sysctls: > > > > > > 1. gc interval (milliseconds, default: 2 minutes) > > > 2. buckets per cycle (default: UINT_MAX / all) > > > > > > This allows to increase the scan intervals but also to reduce bustiness > > > by switching to partial scans of the table for each cycle. > > > > Is there a way to apply autotuning? I know, this question might be > > hard, but when does the user has update this new toggle? And do we > > know what value should be placed here? > > > > @Eyal: What gc interval you selected for your setup to address this > > issue? You mentioned a lot of UDP short-lived flows, correct? > > Yes, we have a lot of short lived UDP flows related to a DNS server service. > > We collect flow termination events using ulogd and forward them as JSON > messages over UDP to fluentd. Since these flows are reaped every 2 minutes, > we see spikes in UDP rx drops due to fluentd not keeping up with the bursts. > > We planned to configure this to run every 10s or so, which should be > sufficient for our workloads, and monitor these spikes in order to tune > further as needed. > > Eyal.