Re: Using iptables with high volume mail

Thomas Jacob <jacob@xxxxxxxxxxxxx> · Fri, 02 Oct 2009 16:52:27 +0200

> Given your numbers of 8000 cps and the above comments it would seem
that we are well 
> within any types of overload issues with any decent off the shelf
> server equipped with two dual core CPUs and the necessary memory.
> If I allocate 500 bytes per connection at the max connections I would
> need ~87Mb + machine overhead.  That's not much in today's world of
> servers.

I would say so, unless someone on the list says NAT has completely
different performance requirements from the connection tracking only
machines. But I did do some tests to find the breaking points of
such machines some time ago (see below) and there should
be plenty of resources left for any additional NAT requirements,
given your numbers.

As for memory, we are using 4GB RAM on our high performance machines
access throughput/latency is important here) with a 2GB Kernel / 2G Userspace-Setup
in order to allow for huge firewall rulesets and to have Linx
user larger default sizes for  various network caches (without us having
to fiddle with the setting).

    Thomas

=== old test results ====

[..]

What we did is run a

system A)

CPU Intel® XEON(TM) E3110 3000MHz 6MB FSB1333 S775
2x RAM DDR2 2GB PC667 Kingston ECC
NET INTEL Pro1000PT 1GBit 2xRJ45 NIC Dual Server
MBI SuperMicro X7SBi 
        Intel® 3210 + ICH9R Chipset. 
        Intel® 82573V + Intel® 82573L
           PCI-E Gigabit Controllers

against

system B)

CPU AMD Opteron 2220 2,8 GHz DualCore Socket F
4XRAM DDr2 1GB PC667 Kingston ECC-Reg CL5 with Parity
   Dual Rank + 2 DDR2 1GB / ECC / CL5 / 667MHz / with Parity / Dual Rank
NET INTEL Pro1000PT 1GBit 2xRJ45 NIC Dual Server
MBA Tyan Thunder h2000m (S3992G3NR-RS) DUAL SKT F
  EATX

I was running pktgen both with generating
a single 64byte/packet UDP-Stream and with 8192 parallel
flows of flowlen 4 with randomization of dst/src ips
and ports (also UPD 64byte/packet) so that the number of conntrack
entries stabilized at almost 512k (most of them timing out
of course).

The result was that the Opteron-System is essential
as fast as the Xeon-System if you have just a
single flow, but for the second, more realistic
test case, the Xeon-System was faster by about 10-20%,
probably due to the much larger CPU-Cache.

RX/TX flow control was enabled, iptables and connection
tracking were loaded. Incoming
and outgoing interface had their smp_affinity set to
single CPU-Core each. Kernel was 2.6.23.14, e1000-drivers
version what was current in Feb 2008.

As a ruleset, I did have 2 chaintrees for 8192 IPs each,
for ingress and egress, each IP had 10 non-matching
rules associated  with it, but this ruleset
was only search for --state  NEW  of course... resulting
in about 13*2=26 chain jumps and (13+10)*2=46 matches per
NEW packet.
(I had ~32k chains and ~210k rules)

Unfortunately I only have the results for the Xeon-System,
the Opteron-Data got lost somehow ;-(

1 stream / default buffers

eth0:eth1  735kpps

500k streams / default buffers

eth0:eth1 254kpps

But those numbers are obviously not comparable to yours... so...

[..]

============= snip  ======

Attachment:
smime.p7s

Description: S/MIME cryptographic signature