Sorry, I re-worked the patch, because it was buggy (ct_general.use was incremented, not tested). Here is the (hopefully) correct one: --- nf_conntrack_core.c 2009-10-23 00:57:56.000000000 +0200 +++ nf_conntrack_core_modded.c 2009-11-01 23:56:23.000000000 +0100 @@ -506,14 +506,12 @@ hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnnode) { tmp = nf_ct_tuplehash_to_ctrack(h); - if (!test_bit(IPS_ASSURED_BIT, &tmp->status)) + if (!test_bit(IPS_ASSURED_BIT, &tmp->status) && likely(!nf_ct_is_dying(ct) && + atomic_read(&ct->ct_general.use)!=0)) ct = tmp; cnt++; } - if (ct && unlikely(nf_ct_is_dying(ct) || - !atomic_inc_not_zero(&ct->ct_general.use))) - ct = NULL; if (ct || cnt >= NF_CT_EVICTION_RANGE) break; hash = (hash + 1) % nf_conntrack_htable_size; @@ -523,6 +521,7 @@ if (!ct) return dropped; + atomic_inc(&ct->ct_general.use); if (del_timer(&ct->timeout)) { death_by_timeout((unsigned long)ct); dropped = 1; On Sun, Nov 1, 2009 at 10:48 PM, <pesce.luca@xxxxxxxxx> wrote: > I noticed that early_drop(), before killing a conntrack, checks if it is > already dying or > if its usage count is zero: if one of these two conditions is true, the > conntrack > is not killed, which is obviously fine. However, these checks are performed > outside > the per-chain loop: if I correctly understand the code, this could lead to > skip a killable conntrack > that was before (so more recently used) the least recently used one that is > already dying. > For example, consider the case of a hash chain with 10 entries: > > - entry number 1 is NOT ASSURED, is not already dying and has ct_general.use >>=1, so it > is a good candidate > - entries from 2 to 9 are ASSURED - not candidates for destroying > - entry 10 is NOT ASSURED, but is already dying by itself (or has > ct_general.use==0) > > The conntrack examined are only those inside this chain (since 10 > > NF_CT_EVICTION_RANGE), > so the chain loop will end with ct pointing to entry number 10, which cannot > be > killed. In this way, early_drop() fails to free a hastable entry, while > entry 1 > was a good candidate. > This is just one case, but not the only, I think... > If this analysis is correct, maybe the code should check the dying status > and the > usage count together with the assured bit. > I am attaching the diff coming from this idea. > > If I did not correctly understood the code, I am sorry, I apologize for > that! > > > --- nf_conntrack_core.c 2009-10-23 00:57:56.000000000 +0200 > +++ nf_conntrack_core_modded.c 2009-11-01 22:22:33.000000000 +0100 > @@ -506,14 +506,12 @@ > hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash], > hnnode) { > tmp = nf_ct_tuplehash_to_ctrack(h); > - if (!test_bit(IPS_ASSURED_BIT, &tmp->status)) > + if (!test_bit(IPS_ASSURED_BIT, &tmp->status) && > unlikely(!nf_ct_is_dying(ct) && > + atomic_inc_not_zero(&ct->ct_general.use))) > ct = tmp; > cnt++; > } > > - if (ct && unlikely(nf_ct_is_dying(ct) || > - !atomic_inc_not_zero(&ct->ct_general.use))) > - ct = NULL; > if (ct || cnt >= NF_CT_EVICTION_RANGE) > break; > hash = (hash + 1) % nf_conntrack_htable_size; -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html