scheduling while atomic followed by oops upon conntrackd -c execution

Kerin Millar <kerframil@xxxxxxxxx> · Fri, 02 Mar 2012 15:11:07 +0000

Hello,

I have recently set up a pair of Dell PowerEdge R610 servers (Xeon 
X5650, 8GB RAM) for active-backup firewall duty. I've installed 
conntrack-tools-1.0.1 and libnetfilter_conntrack-1.0.0 and am using the 
FTFW mode for synchronization across a dedicated gigabit interface. The 
active firewall has to contend with fairly heavy traffic, much of which 
is in the form of long-lived TCP connections to an internal (LVS) load 
balancer, behind which a bunch of application servers sit.

The number of active, concurrent connections to this service peaks at 
around 480,000. At last count, the number of conntrack states was 
785,785 which is typical. I have net.nf_conntrack_max set to 1048576 and 
the nf_conntrack module is loaded with hashsize=262144. The firewall is 
fully stateful in that new connections must match on -ctstate NEW. I'm 
also using "-t raw -A PREROUTING -j CT --ctevents assured" as mentioned 
in the docs.

This is my current test case for the backup:-

1) Boot the system and start conntrackd
2) Run conntrackd -n to sync with the active firewall
3) Run conntrackd -c to commit the states from the external cache

Originally, while conntrackd -c was performing its work, I would 
experience protracted soft lockups. After some investigation, I noticed 
that conntrackd was trying to more states than net.nf_conntrack_max 
which, in turn, led me to this patch:-

https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=af14cca

Although Jozsef's patch was helpful, I'm still experiencing a nasty 
kernel oops after conntrackd -c has finished executing. This always 
occurs within 15 seconds or so - sometimes immediately. Here's a recent 
netconsole trace from 3.3-rc5 + patch:-

http://paste.pocoo.org/raw/559736/

Though I ultimately intend to use the 3.0 kernel, I tried various other 
versions going as far back as 2.6.32. In each case, an oops is 
reproducible - though the details do vary. Using 3.3-rc5, I even noticed 
a null ptr deref on one occcasion. Alas, I was unable to capture it at 
the time.

Here's some other configuration information which may be useful ...

conntrackd.conf: http://paste.pocoo.org/raw/559727/
sysctl.conf: http://paste.pocoo.org/raw/559726/
kernel .config: http://paste.pocoo.org/raw/559725/

It's perhaps worth noting that I followed the advice to set HashLimit in 
conntrackd.conf to at least double that of net.nf_conntrack_max 
(commented in my config because I was experimenting with the issue that 
Jozef's patch rectifies). One thing that puzzles me is why conntrackd 
always tries to commit more state entries than can be accommodated. On 
the master, the internal cache grows to the maximum size and, afaict, 
nothing is ever expired. This is from the master which has been up for a 
while ...

# conntrackd -s | head -n 5
cache internal:
current active connections:          2097152
connections created:                31649757    failed:    234788761
connections updated:               105516073    failed:            0
connections destroyed:              29552605    failed:            0

# conntrack -S | head -n1
entries                 792495

It seems that the cache usage grows to the maximum, at which point the 
creation failed counter starts going skyward. On the backup, it seems 
that conntrackd -n && conntrackd -c tries to commit all of this, but I 
don't really understand why.

Any advice would be most welcome. I can't tinker too much with the 
active firewall at this point but, if it helps, I can conduct any number 
of tests with the backup.

Cheers,

--Kerin
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html