iptables/conntrack in enterprise environment.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I am in an enterprise environment and I'm having some problems with conntrack 
specifically.

We have a system that acts as a router, however any new inbound connection for 
any machine behind this router is re-directed to a specific port on the local 
machine, where an application responds as if it were the system behind the 
router.  These systems experience some very high volumes of traffic 
(sustaining over 30mbit of traffic).  Heres a breakdown of TCP socket 
connections by status at one particular point in time:
ESTABLISHED : 1363
LAST_ACK    : 27
TIME_WAIT   : 616
FIN_WAIT2   : 8
FIN_WAIT1   : 140
SYN_RECV    : 6188
CLOSE_WAIT  : 365
LISTEN      : 3
CLOSING     : 5

We have multiple systems performing this task (essentially for load balancing 
and to remove a single point of faulure).  The systems are dual 1ghz pentium 
3's, with 1-2gb of ram, so they're not shy systems.  They're running 2.4.20 
kernels (mostly vanilla) with iptables 1.2.7a.

Here are some system limits I am tweaking (ie. the commands to do the 
tweaking):
echo 1 >/proc/sys/net/ip_forward
echo 524280 >/proc/sys/fs/file-max
echo 524280 >/proc/sys/net/ipv4/ip_conntrack_max
echo 65535 >/proc/sys/net/ipv4/ip_queue_maxlen
echo 65535 >/proc/sys/net/ipv4/tcp_max_syn_backlog
#     NONE    EST     SYN_S   SYN_R   FIN_W   TIME_W  CLOSE   CLOSE_W LAST_A  
LISTEN
echo "1800    21600   120     60      30      30      10      30      30      
120     " > /proc/sys/net/ipv4/ip_conntrack_tcp_timeouts
ulimit -H -n 524280
ulimit -S -n 524280
iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS 
- --clamp-mss-to-pmt


Because every new connection to one of the systems 'behind' these systems need 
to be re-directed to a local port, which is achieved with the command:
/sbin/iptables -t nat -A PREROUTING -j DNAT -i eth0 -p tcp -d <ip-range> 
- --destination-port 1024:65535 --to-destination <local_ip>:<local_port>

Every inbound connection incurs an entry in the connection tracking table.  It 
seems, however, that we may be overloading the conntrack system.  I can 
telnet to a different port listening on a secondary (internal) interface (but 
the same application), that bypasses the above rule, and get an immediate 
connection, however establishing a connection 'to' a server behind this 
router can take a number of seconds, and sometimes may never establish.  
Whats more connecting directly to the port everything else is being 
re-directed to via. the 'public' interface itself can take some time (though 
not as long as connecting 'to' a system behind this box).

The conntrack table itself very quickly grows - but it does not clean itself 
up when the connection itself dissapears, instead it waits for some 
pre-determined timeout value, which means even though, as shown above, the 
number of connections in-progress (one way or another) is about 8000 
connections, the conntrack table is absolutely huge (hundreds of thousands of 
entries), and as time goes on, the larger it gets.  To try to combat this, 
I've reduced the biggest timer (how long an established connection stays in 
conntrack) from 5 days to 6 hours (all the connections we have are, and 
should be, short lived, so that is plenty of time).  This helps a bit, 
however I'm still at a loss to try to understand why conntrack does not clean 
itself up when the connection gets closed.

Of course, seeing how big the conntrack table is, itself, impacts the system 
dramatically.  The 'wc -l /proc/net/ip_conntrack' command takes a long time 
to run, and brings the 'active' processing to its knees while doing this. 
- From all appearances, it appears conntrack is hamstringing us.  It appears it 
is not able to properly handle large-traffic systems, especially where 
essentially every connection going through the system is nat'd.

I'd appreciate ANY help or thoughts on how to remedy this issue, as I have 
said, this is in use in an enterprise environment (which of course, means I 
cannot divulge the purpose of the application I mentioned earlier (however 
all you really need to know is the application does not really know (or care) 
weather the connection to it is nat'd, and just uses it as a standard socket 
connection), however I can give details on the system/kernel/netfilter 
configuration as necessary, just let me know what further information you 
require).

I thank you in advance, and apologise for mailing all 3 lists, however I 
figured someone on ONE of these lists would have an idea or a suggestion.

- -- 
PreZ
Systems Administrator
Shadow Realm

PGP FingerPrint: B3 0C F3 32 DE 5A 7D 90  26 F6 FA 38 CC 0A 2D D8
Finger prez@xxxxxxxxxxxxx for full PGP public key.

Shadow Realm, a hobbyist ISP supplying real internet services.
http://www.srealm.net.au
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE+1ZcVKFp14D8AGEQRAh4jAJ9mhilrpVsDvakS03re/HsT1jcXcwCcDqFT
nHqa0y2UPb9s5JgRsGIhP8o=
=bmA5
-----END PGP SIGNATURE-----




[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux