On Wed, Dec 1, 2021 at 1:24 PM Florian Westphal <fw@xxxxxxxxx> wrote: > > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > > I do not see how. > > > > I started a patchset, but the single hashtable for every netns might > > defeat the batching, if there is a table per netns then it should be > > similar to 67cc570edaa0. > > I see. > > > > for going with 2m. > > > > Default 2m is probably too large? This should be set at least to the > > UDP unreplied timeout, ie. 30s? > > Perhaps but I don't think 30s is going to resolve the issue at hand > (burstiness). > > > Probably set default scan interval to 20s and reduce it if there is > > workload coming in the next scan round? It is possible to count for > > the number of entries that will expired in the next round, if this > > represents a large % of entries, then reduce the scan interval of the > > vgarbage collector. > > I don't see how thios helps, see below. > > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c > > index 770a63103c7a..3f6731a9c722 100644 > > --- a/net/netfilter/nf_conntrack_core.c > > +++ b/net/netfilter/nf_conntrack_core.c > > @@ -77,7 +77,8 @@ static __read_mostly bool nf_conntrack_locks_all; > > /* serialize hash resizes and nf_ct_iterate_cleanup */ > > static DEFINE_MUTEX(nf_conntrack_mutex); > > > > -#define GC_SCAN_INTERVAL (120u * HZ) > > +/* Scan every 20 seconds which is 2/3 of the UDP unreplied timeout. */ > > +#define GC_SCAN_INTERVAL (20u * HZ) > > #define GC_SCAN_MAX_DURATION msecs_to_jiffies(10) > > > > #define MIN_CHAINLEN 8u > > @@ -1418,12 +1419,22 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct) > > return false; > > } > > > > +static bool nf_ct_is_expired_next_run(const struct nf_conn *ct) > > +{ > > + unsigned long next_timestamp = nfct_time_stamp + GC_SCAN_INTERVAL; > > + > > + return (__s32)(ct->timeout - next_timestamp) <= 0; > > +} > > Ok. > > > static void gc_worker(struct work_struct *work) > > { > > + unsigned long next_run_expired_entries = 0, entries = 0, idle; > > unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION; > > unsigned int i, hashsz, nf_conntrack_max95 = 0; > > unsigned long next_run = GC_SCAN_INTERVAL; > > struct conntrack_gc_work *gc_work; > > + bool next_run_expired; > > + > > gc_work = container_of(work, struct conntrack_gc_work, dwork.work); > > > > i = gc_work->next_bucket; > > @@ -1448,6 +1459,8 @@ static void gc_worker(struct work_struct *work) > > struct nf_conntrack_net *cnet; > > struct net *net; > > > > + next_run_expired = false; > > + entries++; > > tmp = nf_ct_tuplehash_to_ctrack(h); > > > > if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) { > > @@ -1458,6 +1471,9 @@ static void gc_worker(struct work_struct *work) > > if (nf_ct_is_expired(tmp)) { > > nf_ct_gc_expired(tmp); > > continue; > > + } else if (nf_ct_is_expired_next_run(tmp)) { > > + next_run_expired = true; > > + next_run_expired_entries++; > > This means this expires within next 20s, but not now. > > > @@ -1511,7 +1531,22 @@ static void gc_worker(struct work_struct *work) > > if (next_run) { > > gc_work->early_drop = false; > > gc_work->next_bucket = 0; > > + /* > > + * Calculate gc workload for the next run, adjust the gc > > + * interval not to reap expired entries in bursts. > > + * > > + * Adjust scan interval linearly based on the percentage of > > + * entries that will expire in the next run. The scan interval > > + * is inversely proportional to the workload. > > + */ > > + if (entries == 0) { > > + next_run = GC_SCAN_INTERVAL; > > + } else { > > + idle = 100u - (next_run_expired_entries * 100u / entries); > > + next_run = GC_SCAN_INTERVAL * idle / 100u; > > AFAICS we may now schedule next run for 'right now' even though that > would not find any expired entries (they might all have a timeout of > 19s). Next round would reap no entries, then resched again immediately > > (the nf_ct_is_expired_next_run expire count assumes next run is in > 20s, not before). > > This would burn cycles for 19s before those entries can be expired. > > Not sure how to best avoid this, perhaps computing the remaining avg timeout > of the nf_ct_is_expired_next_run() candidates would help? At least for us - where our load is mostly constant - using an avg seems like a good approach.