Hi Pablo, I was now able to test your enhancements of conntrackd. Pablo Neira Ayuso wrote: > I think that I have reproduced your problem in my testbed. Say you have > two nodes: A and B. Initially, A is primary and B is backup. > > 1) you generate tons of http traffic: A succesfully replicates states to B. > 2) you trigger the fail-over: B becomes primary and A becomes backup. B > successfully recovers the connections. Moreover, if you do `conntrack -L > -p tcp' in A, you see lots of entries. > 3) Just a bit later - 30 seconds later or so - you trigger the fail-over > again from B to A. In this case, A fails to recover the entries showing > tons of INVALID messages. Well, not exactly. My problem occurs already in step 2. Before starting the test, I stop conntrackd on both nodes, clear the connections from the table with 'conntrack -F' and start conntrackd again. Both nodes have empty connection tracking tables at this point in time. Then I start the HTTP traffic and trigger the fail-over. I see the INVALID messages on the first fail-over, as soon as I have more than about 500 (with NAT) to 750 (without NAT) parallel TCP sessions, built up and teared down rapidly. I'm not sure where the bottleneck is in this case. CPU of the nodes and bandwith of the "node interconnect" (dedicated interfaces) are not busy at all. > This problem is fixed in the git repository. Now, we purge the entries > in A once this node becomes backup after 15 seconds - this parameter is > tunable via PurgeTimeout. Thus, the old entries does not clash with the > brand new. I compiled the current version from git. Unfortunately, it does not change the results for me. best regards Bernhard -- To unsubscribe from this list: send the line "unsubscribe netfilter" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html