Yi-Hung Wei and Justin Pettit found a race in the garbage collection scheme used by nf_conncount. When we don't find a result for a given tuple that is stored in the conncount list when doing a lookup in the connection tracking table, we remove this tuple from our list because we erronously believed that the entry is stale and conntrack entry long gone. This is the common cause, but turns out its not the only one: The conncount list entry could have been created just before by another cpu, i.e. the conntrack entry might not yet have been inserted into the global hash. The avoid this, we introduce a timestamp and the owning cpu. If the entry appears to be stale, evict only if: 1. The cpu finding its stale is the one that added it 2. The timestamp is older than two jiffies The second constaint allows GC to be taken over by other cpu too (e.g. because a cpu was offlined, or napi got moved around). This most likely also fixes an xt_connlimit imbalance earlier reported by Dmitry Andrianov. Cc: Dmitry Andrianov <dmitry.andrianov@xxxxxxxxxxx> Reported-by: Justin Pettit <jpettit@xxxxxxxxxx> Reported-by: Yi-Hung Wei <yihung.wei@xxxxxxxxx> Signed-off-by: Florian Westphal <fw@xxxxxxxxx> --- Only compile tested so far, sending it now to illustrate idea and so that others can review. net/netfilter/nf_conncount.c | 38 +++++++++++++++++++++++++++++++----- 1 file changed, 33 insertions(+), 5 deletions(-) diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c index d8383609fe28..85aa7be75bbd 100644 --- a/net/netfilter/nf_conncount.c +++ b/net/netfilter/nf_conncount.c @@ -47,6 +47,8 @@ struct nf_conncount_tuple { struct hlist_node node; struct nf_conntrack_tuple tuple; struct nf_conntrack_zone zone; + int cpu; + u32 jiffies32; }; struct nf_conncount_rb { @@ -91,11 +93,40 @@ bool nf_conncount_add(struct hlist_head *head, return false; conn->tuple = *tuple; conn->zone = *zone; + conn->cpu = raw_smp_processor_id(); + conn->jiffies32 = (u32)jiffies + 2; hlist_add_head(&conn->node, head); return true; } EXPORT_SYMBOL_GPL(nf_conncount_add); +static const struct nf_conntrack_tuple_hash * +find_or_evict(struct net *net, struct nf_conncount_tuple *conn) +{ + const struct nf_conntrack_tuple_hash *found; + unsigned long a, b; + int cpu = raw_smp_processor_id(); + + found = nf_conntrack_find_get(net, &conn->zone, &conn->tuple); + if (found) + return found; + + b = conn->jiffies32; + a = (u32)jiffies; + + /* conn might have been added just before by another cpu and + * might still be unconfirmed. In this case, nf_conntrack_find() + * returns no result. Thus only evict if this cpu added the + * stale entry or if the entry is older than two jiffy. + */ + if (conn->cpu == cpu || time_after(a, b)) { + hlist_del(&conn->node); + kmem_cache_free(conncount_conn_cachep, conn); + } + + return NULL; +} + unsigned int nf_conncount_lookup(struct net *net, struct hlist_head *head, const struct nf_conntrack_tuple *tuple, const struct nf_conntrack_zone *zone, @@ -111,12 +142,9 @@ unsigned int nf_conncount_lookup(struct net *net, struct hlist_head *head, /* check the saved connections */ hlist_for_each_entry_safe(conn, n, head, node) { - found = nf_conntrack_find_get(net, &conn->zone, &conn->tuple); - if (found == NULL) { - hlist_del(&conn->node); - kmem_cache_free(conncount_conn_cachep, conn); + found = find_or_evict(net, conn); + if (found == NULL) continue; - } found_ct = nf_ct_tuplehash_to_ctrack(found); -- 2.17.1 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html