Hi Bernhard, Bernhard Bock wrote: > My next step is to run two firewalls in a cluster with conntrackd. > > The basic setup works like a charm. I have increased the HashSize > parameter in conntrackd as well. It replicates the states to the backup > firewall just fine. > > Unfortunately, failover works only in about 50% of all tests. There is > no obvious pattern as to when this failures occur. > > We trigger the failover softly by advertising a higher priority on the > backup firewall, not by switching off the primary one. If it goes well, > we do not loose a single connection. If it doesn't go well, we basically > loose all connections and the apachebench dies. There are hundreds of > INVALID packets in the syslog, and also some NEW (not SYN). In this > case, we also see lost packets in "multicast sequence tracking" in the > conntrackd stats. I think that I have reproduced your problem in my testbed. Say you have two nodes: A and B. Initially, A is primary and B is backup. 1) you generate tons of http traffic: A succesfully replicates states to B. 2) you trigger the fail-over: B becomes primary and A becomes backup. B successfully recovers the connections. Moreover, if you do `conntrack -L -p tcp' in A, you see lots of entries. 3) Just a bit later - 30 seconds later or so - you trigger the fail-over again from B to A. In this case, A fails to recover the entries showing tons of INVALID messages. The problem are the entries that are stuck in A (see step 2). Those former entries clashes with newly committed entries and the TCP state tracking code gets confused with old state information. This problem is fixed in the git repository. Now, we purge the entries in A once this node becomes backup after 15 seconds - this parameter is tunable via PurgeTimeout. Thus, the old entries does not clash with the brand new. Moreover, I have completely reworked the fail-over script, you can find it under doc/ in the conntrack-tools git tree [1]. You may give it a try. I expect to release a new version of the conntrack-tools with these updates soon. New (more complete) documentation is also on the way. Please, let me know how it goes. [1] http://git.netfilter.org -- "Los honestos son inadaptados sociales" -- Les Luthiers -- To unsubscribe from this list: send the line "unsubscribe netfilter" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html