Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> wrote: > Le 16/01/2017 à 11:56, Florian Westphal a écrit : > > This further refines the changes made to conntrack gc_worker in > > commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics"). > > > > The main of this change is to reduce the scan interval when evictions > > take place. > > > > However, on the reporters' setup, there are 1-2 Million conntrack entries > > in total, and roughly 8k new connections per second. > > > > In this case we'll always evict at least one entry per gc cycle and scan > > interval is always at 1 jiffy. > > > > Given we scan ~10k entries per run its clearly wrong to reduce interval > > based on number of evictions, it will only waste cpu cycles. > > > > Thus only look at the ratio (scanned entries vs. evicted entries) to make > > a decision on whether to reduce or not. > > > > Given the fact that evictor is supposed to only kick in when system turns > > idle after a busy period, we should pick a high ratio -- this makes it 50%. > > > > Also, get rid of GC_MAX_EVICTS. Instead of breaking loop and instant > > resched, just don't bother checking this at all (the loop calls > > cond_resched for every bucket anyway). > > > > Removal of GC_MAX_EVICTS was suggested by Nicolas in the past; I > > should have listened. > I (still) agree with that part, maybe it could be done in another patch. Done. > > Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries") > > Signed-off-by: Florian Westphal <fw@xxxxxxxxx> > With that patch, it's worse than before. I regularly have a delay > 30 seconds > in my tests (this was infrequent before the patch). :-/ We could lower the ratio of course, but under synflood with 3 second syn_recv timeout I get a scan/evict ratio of ~15%, and we should not schedule worker all the time in that case. > A solution, to fix the balance between cpu cycles consumption and short ct > events delays may be to introduce a new sysctl which allows the user to choose > between two predefined configs: short delay or long delay for ct events. > Even if I don't like this, maybe it could be acceptable with a default value set > to 'short delay'. TBH in that case I'd prefer to just give up, and change worker to this: for (i = 0; i < nf_conntrack_htable_size; i++) { [..] hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) { tmp = nf_ct_tuplehash_to_ctrack(h); if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); continue; } } [..] cond_resched_rcu_qs(); } if (!gc_work->exiting) queue_delayed_work(system_long_wq, &gc_work->dwork, nf_conntrack_gc_interval); } Where nf_conntrack_gc_interval gets exported to userspace (in milliseconds), with a default set e.g. to one minute. Obvious downside: with low setting like 1 second we'll do frequent rescan of entire table. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html