Re: conntrackd high cpu usage

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Mon, 16 Jan 2012 23:58:10 +0100

On Mon, Jan 16, 2012 at 08:53:23PM +0100, Stefan Majer wrote:
> Hi Pablo,
> 
> On Mon, Jan 16, 2012 at 12:28 PM, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> > Hi Stefan,
> >
> > On Mon, Jan 09, 2012 at 07:49:55PM +0100, Stefan Majer wrote:
> >> Hi,
> >>
> >> we have 2 8core Xeon Boxes with 2 Intel X520 10GBit Adapter running
> >> rhel 6.1 as redundant firewall.
> >
> > Interesting setup. So far, the reports of conntrackd usage that
> > I've received are deployments with 1GBit NICs and smaller machines
> > (up to 2-4 cores).
> >
> >> On every node we have conntrackd installed with a FTFW mode, we
> >> synchronize all states.
> >> Synchronization is made over multicast on a dedicated vlan interface.
> >> The Firewall itself actually have around 300 vlans active.
> >>
> >> Actually we see permanent ~400 new connections/sec with peaks at 800
> >> conn/sec.
> >
> > I've been abled to reach up to 20000 sessions/sec with 6 years old
> > hardward (dual core, 2.4GHz, 1Gbit links). I know people that
> > got better results in more modern hardware.
> 
> This would be sufficient for our use case but...
> 
> >
> > You may want to enable the reliable synchronization option in
> > conntrackd. With it, conntrackd starts dropping packets if the
> > synchronization does not happen timely.
> 
> This is probably not what we want as this prevent a working state on
> the secondary machine at any time right ?

the reliable synchronization means that we drop network packets in the
primary if we cannot back off (the rate of state-changes/s is so high
that conntrackd starts dropping events of state-changes coming from
the kernel).

See NetlinkEventsReliable option.

> >> With this load the conntrackd consumes about 15 - 25 % CPU from one
> >> CPU on the active side and about 5% CPU usage on the passive side.
> >> Is this expected ?
> >
> > What tool are you using to obtain those measurements?
> 
> This was actually with measured with top.
> 
> > top is fine for estimated load, but it's inaccurate.

sysstat is a simple tool and it's bit better.

> > Still, full state synchronization is a resource consuming task
> 
> Is it possible to reduce the synchronization of specifc state events
> to ESTABLISHED, and NEW for example
> without loosing a working state on the secondary side ?

Yes, please have a look at the conntrack-tools user-manual documentation.
See the CT target iniptables.

> >> This is our Testing environment, and we expect much higher (~10 - 20
> >> times) connection rates.
> >>
> >> This would not be possible with the current setup, as this would be
> >> cpu bound on the conntrackd, as this daemon is single threaded.
> >> Is there any way to make this process faster, eg. make the
> >> synchronization multi threaded ?
> >
> > There several things that we can do to improve conntrackd performance
> > (from the development side):
> >
> > 1) port conntrackd to libmnl to use recvmmsg system call.
> > 2) implement netlink multi-queue, we discussed this during the
> > NFWS2010. The idea is to implement something similar to the existing
> > nfqueue multiqueue load balancing (see --queue-balance in iptables's
> > NFQUEUE). It's similar to multi-threading that you're proposing.
> > 3) implement batching for the commit operation.
> >
> > So far, nobody has come to show interest on these tasks. Recent
> > enhancements for conntrackd have focused on adding new features.
> 
> This sounds all great but i have no idea how much this would increase
> performance.
> We will first try to measure our current environment how many conn/sec
> we are able to synchronize.

I don't have numbers because it's not implemented yet ;-), but I'm
sure this will boost performance considerably.

The recvmmsg will reduce the huge amount of recv system calls that
happen under heavy load to allow conntrackd receiving state-change
events from kernel-space.

The multiqueue approach will let it scale for a high number of
processors / cores.

The batching will allow us to reduce the time to inject the states
into the kernel.

> >> I already did some perf analysis, but they didnt gave us much light.
> >
> > What tools are you using?
> 
> we were using perf record, see man 1 perf.
> 
> > I suggest you to have a look at Willy Tarreau's tool (httpterm). You
> > may want to use my http client instead of inject32.
> >
> > http://1984.lsi.us.es/git/http-client-benchmark/
> 
> I will check both, but yours wont compile with:
> 
> make
> gcc -g -c alarm.c -o alarm.o
> gcc -g -c client.c -o client.o
> client.c: In function ‘print_alarm_cb’:
> client.c:335:3: warning: format ‘%llu’ expects argument of type ‘long
> long unsigned int’, but argument 5 has type ‘uint64_t’ [-Wformat]
> client.c:335:3: warning: format ‘%u’ expects argument of type
> ‘unsigned int’, but argument 10 has type ‘__time_t’ [-Wformat]
> client.c:335:3: warning: format ‘%u’ expects argument of type
> ‘unsigned int’, but argument 11 has type ‘__suseconds_t’ [-Wformat]
> client.c: In function ‘main’:
> client.c:404:5: error: variable-sized object may not be initialized
> make: *** [all] Error 1

Interesting, I don't hit that problem here.

I have applied one fix to git. Let me know if it compiles now.

This tool is quite rudimentary, not documented and I think I'm the one
using it for my benchmark evaluations. But it's very useful.
--
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html