Le 03/11/2016 à 00:04, Florian Westphal a écrit : > Nicholas Dichtel says: > After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to > remove timed-out entries"), netlink conntrack deletion events may be > sent with a huge delay. > > Nicholas further points at this line: > > goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS); > > and indeed, this isn't optimal at all. Rationale here was to ensure that > we don't block other work items for too long, even if > nf_conntrack_htable_size is huge. But in order to have some guarantee > about maximum time period where a scan of the full conntrack table > completes we should always use a fixed slice size, so that once every > N scans the full table has been examined at least once. > > We also need to balance this vs. the case where the system is either idle > (i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens > from packet path). > > So, after some discussion with Nicholas: > > 1. want hard guarantee that we scan entire table at least once every X s > -> need to scan fraction of table (get rid of upper bound) > > 2. don't want to eat cycles on idle or very busy system > -> increase interval if we did not evict any entries > > 3. don't want to block other worker items for too long > -> make fraction really small, and prefer small scan interval instead > > 4. Want reasonable short time where we detect timed-out entry when > system went idle after a burst of traffic, while not doing scans > all the time. > -> Store next gc scan in worker, increasing delays when no eviction > happened and shrinking delay when we see timed out entries. > > The old gc interval is turned into a max number, scans can now happen > every jiffy if stale entries are present. > > Reported-by: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> > Signed-off-by: Florian Westphal <fw@xxxxxxxxx> > --- > Change since v1: use system_long_wq instead of normal system wq (suggested by > Eric Dumazet). > > Nicholas is currently away; I would like to get his feedback on this one > before it gets applied. Thank you for the update. With that patch, some events still have a delay > 2 minutes, which I think is too much. If I'm not wrong, the worst delay with this patch is: 10 (GC_INTERVAL_MAX) + 0,001 + 5,001 + 5,002 + 5,003 + ... + 6,024 (= 5 secs + 1024 mecs) = 10 + 0,001 + 5x1024 + (1 + 2 + 3 + ... 1024)/1000 = 10 + 0,001 + 5x1024 + (1024x1023/2)/1000 = 5653,77 seconds = 94 minutes I take the case where gc_work->next_gc_run == GC_INTERVAL_MAX (10 seconds), then an entry is evicted (gc_work->next_gc_run /= 2U; (=> 5 seconds) and next_run is set to 0,001 seconds) and the next entry to evict needs a full table scan, ie 1024 (GC_MAX_BUCKETS_DIV) rounds (we add 1 msecs at each round). Even if we start from a delay of 0, to perform a full scan we need: 1 + 2 + 3 + ... 1024 = 1024x1023/2 = 523776 msecs ~= 8,7 minutes Previously (in private discussions), you propose a algorithm which guarantee a full table scan in a predefined delay. A "good" solution may have such guarantee. Regards, Nicolas -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html