-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I am in an enterprise environment and I'm having some problems with conntrack specifically. We have a system that acts as a router, however any new inbound connection for any machine behind this router is re-directed to a specific port on the local machine, where an application responds as if it were the system behind the router. These systems experience some very high volumes of traffic (sustaining over 30mbit of traffic). Heres a breakdown of TCP socket connections by status at one particular point in time: ESTABLISHED : 1363 LAST_ACK : 27 TIME_WAIT : 616 FIN_WAIT2 : 8 FIN_WAIT1 : 140 SYN_RECV : 6188 CLOSE_WAIT : 365 LISTEN : 3 CLOSING : 5 We have multiple systems performing this task (essentially for load balancing and to remove a single point of faulure). The systems are dual 1ghz pentium 3's, with 1-2gb of ram, so they're not shy systems. They're running 2.4.20 kernels (mostly vanilla) with iptables 1.2.7a. Here are some system limits I am tweaking (ie. the commands to do the tweaking): echo 1 >/proc/sys/net/ip_forward echo 524280 >/proc/sys/fs/file-max echo 524280 >/proc/sys/net/ipv4/ip_conntrack_max echo 65535 >/proc/sys/net/ipv4/ip_queue_maxlen echo 65535 >/proc/sys/net/ipv4/tcp_max_syn_backlog # NONE EST SYN_S SYN_R FIN_W TIME_W CLOSE CLOSE_W LAST_A LISTEN echo "1800 21600 120 60 30 30 10 30 30 120 " > /proc/sys/net/ipv4/ip_conntrack_tcp_timeouts ulimit -H -n 524280 ulimit -S -n 524280 iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS - --clamp-mss-to-pmt Because every new connection to one of the systems 'behind' these systems need to be re-directed to a local port, which is achieved with the command: /sbin/iptables -t nat -A PREROUTING -j DNAT -i eth0 -p tcp -d <ip-range> - --destination-port 1024:65535 --to-destination <local_ip>:<local_port> Every inbound connection incurs an entry in the connection tracking table. It seems, however, that we may be overloading the conntrack system. I can telnet to a different port listening on a secondary (internal) interface (but the same application), that bypasses the above rule, and get an immediate connection, however establishing a connection 'to' a server behind this router can take a number of seconds, and sometimes may never establish. Whats more connecting directly to the port everything else is being re-directed to via. the 'public' interface itself can take some time (though not as long as connecting 'to' a system behind this box). The conntrack table itself very quickly grows - but it does not clean itself up when the connection itself dissapears, instead it waits for some pre-determined timeout value, which means even though, as shown above, the number of connections in-progress (one way or another) is about 8000 connections, the conntrack table is absolutely huge (hundreds of thousands of entries), and as time goes on, the larger it gets. To try to combat this, I've reduced the biggest timer (how long an established connection stays in conntrack) from 5 days to 6 hours (all the connections we have are, and should be, short lived, so that is plenty of time). This helps a bit, however I'm still at a loss to try to understand why conntrack does not clean itself up when the connection gets closed. Of course, seeing how big the conntrack table is, itself, impacts the system dramatically. The 'wc -l /proc/net/ip_conntrack' command takes a long time to run, and brings the 'active' processing to its knees while doing this. - From all appearances, it appears conntrack is hamstringing us. It appears it is not able to properly handle large-traffic systems, especially where essentially every connection going through the system is nat'd. I'd appreciate ANY help or thoughts on how to remedy this issue, as I have said, this is in use in an enterprise environment (which of course, means I cannot divulge the purpose of the application I mentioned earlier (however all you really need to know is the application does not really know (or care) weather the connection to it is nat'd, and just uses it as a standard socket connection), however I can give details on the system/kernel/netfilter configuration as necessary, just let me know what further information you require). I thank you in advance, and apologise for mailing all 3 lists, however I figured someone on ONE of these lists would have an idea or a suggestion. - -- PreZ Systems Administrator Shadow Realm PGP FingerPrint: B3 0C F3 32 DE 5A 7D 90 26 F6 FA 38 CC 0A 2D D8 Finger prez@xxxxxxxxxxxxx for full PGP public key. Shadow Realm, a hobbyist ISP supplying real internet services. http://www.srealm.net.au -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE+1ZcVKFp14D8AGEQRAh4jAJ9mhilrpVsDvakS03re/HsT1jcXcwCcDqFT nHqa0y2UPb9s5JgRsGIhP8o= =bmA5 -----END PGP SIGNATURE-----