Re: [PATCH nf] netfilter: conntrack: refine gc worker heuristics, redux

Florian Westphal <fw@xxxxxxxxx> · Mon, 16 Jan 2017 18:33:49 +0100

Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> wrote:
> Le 16/01/2017 à 11:56, Florian Westphal a écrit :
> > This further refines the changes made to conntrack gc_worker in
> > commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").
> > 
> > The main of this change is to reduce the scan interval when evictions
> > take place.
> > 
> > However, on the reporters' setup, there are 1-2 Million conntrack entries
> > in total, and roughly 8k new connections per second.
> > 
> > In this case we'll always evict at least one entry per gc cycle and scan
> > interval is always at 1 jiffy.
> > 
> > Given we scan ~10k entries per run its clearly wrong to reduce interval
> > based on number of evictions, it will only waste cpu cycles.
> > 
> > Thus only look at the ratio (scanned entries vs. evicted entries) to make
> > a decision on whether to reduce or not.
> > 
> > Given the fact that evictor is supposed to only kick in when system turns
> > idle after a busy period, we should pick a high ratio -- this makes it 50%.
> > 
> > Also, get rid of GC_MAX_EVICTS.  Instead of breaking loop and instant
> > resched, just don't bother checking this at all (the loop calls
> > cond_resched for every bucket anyway).
> > 
> > Removal of GC_MAX_EVICTS was suggested by Nicolas in the past; I
> > should have listened.
> I (still) agree with that part, maybe it could be done in another patch.

Done.

> > Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries")
> > Signed-off-by: Florian Westphal <fw@xxxxxxxxx>
> With that patch, it's worse than before. I regularly have a delay > 30 seconds
> in my tests (this was infrequent before the patch).

:-/

We could lower the ratio of course, but under synflood with 3 second syn_recv timeout
I get a scan/evict ratio of ~15%, and we should not schedule worker all the time in that case.

> A solution, to fix the balance between cpu cycles consumption and short ct
> events delays may be to introduce a new sysctl which allows the user to choose
> between two predefined configs: short delay or long delay for ct events.
> Even if I don't like this, maybe it could be acceptable with a default value set
> to 'short delay'.

TBH in that case I'd prefer to just give up, and change worker to this:

        for (i = 0; i < nf_conntrack_htable_size; i++) {
         [..]
                hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) {
                        tmp = nf_ct_tuplehash_to_ctrack(h);

                        if (nf_ct_is_expired(tmp)) {
                                nf_ct_gc_expired(tmp);
                                continue;
                        }
                }
	[..]
             cond_resched_rcu_qs();
        }

        if (!gc_work->exiting)
                queue_delayed_work(system_long_wq, &gc_work->dwork, nf_conntrack_gc_interval);
}

Where nf_conntrack_gc_interval gets exported to userspace (in milliseconds),
with a default set e.g. to one minute.

Obvious downside: with low setting like 1 second we'll do frequent
rescan of entire table.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html