conntrackd causes kernel panic

Rainer Sabelka <sabelka@xxxxxxxxxxxxxxxx> · Tue, 10 Jun 2008 13:43:11 +0200

Hi,

I've posted the message below to the netfilter list yesterday.
Since Patrick asked to send crash reports also to netfilter-devel, I'm also 
posting it here now. 
Please let me know if you need additional information.

-Rainer

-------------------------------------------------------------------

Hi,

I'm using conntrackd and keepalived (for a pair of redundant firewalls in 
active/backup configuration), and from time to time I experience kernel 
panics or other random system crashes.
I'm new to conntrackd, so its likely that I made just some mistakes in my 
configuration.

I'm getting the crashes when keepalived switches the backup host to 
active.
Manually I can trigger the kernel panic when I execute "conntrackd -c" on the 
backup host (sometimes "conntrackd -c" executes sucessfully, but it crashes 
at the latest when I repeat the command a few times).

This is my setup:
* Ubuntu Linux with the distribution's kernel 2.6.24-18-server
* libnfnetlink 0.0.38 (compiled from sources)
* libnetfilter-conntrack 0.0.94 (compiled from sources)
* conntrack-tools 0.9.7 (compiled from sources)

My conntrackd.conf is attached below.

Does anybody have an idea why I get these crashes and what I could do to avoid 
them?

Best regards,
-Rainer

---- /etc/conntrackd.conf -----
Sync {
        Mode FTFW {
                ResendBufferSize 262144
                CommitTimeout 180
                ACKWindowSize 20
        }
        Multicast {
                IPv4_address 225.0.0.50
                IPv4_interface 10.0.1.204 # IP of dedicated link
                Interface eth0
                Group 3780
        }
        Checksum on
}
General {
        HashSize 8192
        HashLimit 65535
        LockFile /var/lock/conntrack.lock
        UNIX {
                Path /tmp/sync.sock
                Backlog 20
        }
        SocketBufferSize 262142
        SocketBufferSizeMaxGrown 655355
}
IgnoreTrafficFor {
        IPv4_address 127.0.0.1 # loopback
        IPv4_address 10.0.1.203
        IPv4_address 10.0.1.204
        IPv4_address 10.0.0.1
        IPv4_address 10.9.62.1
        IPv4_address 10.9.62.203
        IPv4_address 10.9.62.204
}
IgnoreProtocol {
        ICMP
        IGMP
        VRRP
}

---------------------------------------------

Some additional information:

I've now turned on logging to syslog in conntrackd.conf to see if I can get 
some more 
information on my problem.

1.) Now, I can see lots of the following messages in the syslog:

Jun  9 18:52:49 fw1b conntrack-tools[7385]: Received seq=1213034051 before 
expected seq=1213034052

2.) When I do "conntrackd -c" I get: 
Jun  9 18:52:50 fw1b conntrack-tools[10678]: committing external cache
Jun  9 18:52:50 fw1b conntrack-tools[10678]: commit: Invalid argument
[...]
Jun  9 18:52:50 fw1b conntrack-tools[10678]: commit: Cannot allocate memory
[...]
Jun  9 18:52:50 cfw1b conntrack-tools[10678]: Committed 2 new entries
Jun  9 18:52:50 cfw1b conntrack-tools[10678]: 89 entries can't be committed

3.) Since I turned on logging "conntrackd -c" now seems to be more stable. In 
the first moment I thought my problem was fixed. But then, I started a script 
which executed this command repeatedly in a loop. It eventually triggered 
a kernel oops:

# while sleep 1 ; do conntrackd -c ; done
 fw1b kernel: [ 6714.379206] ------------[ cut here ]------------
 fw1b kernel: [ 6714.381285] invalid opcode: 0000 [#1] SMP
 fw1b kernel: [ 6714.388793] Process kjournald (pid: 2267, ti=c79ac000 
task=c5121140 task.ti=c79ac000)
 fw1b kernel: [ 6714.388824] Stack: c5c40a80 00000000 c5c40a80 00000000 
c1422000 c5121140 c50e0000 00000002
 fw1b kernel: [ 6714.389418]        c79adf84 c79adf7c 00000000 c04980e0 
c049b480 c049b480 c049b480 c79adf88
 fw1b kernel: [ 6714.390152]        00000286 c013b547 c79882ec ffffffff 
c79882ec 00000286 c013b5c5 00000286
 fw1b kernel: [ 6714.391701] Call Trace:
 fw1b kernel: [ 6714.392871]  [<c013b547>] lock_timer_base+0x27/0x60
 fw1b kernel: [ 6714.393652]  [<c013b5c5>] try_to_del_timer_sync+0x45/0x50
 fw1b kernel: [ 6714.394210]  [<c8ace740>] kjournald+0xa0/0x200 [jbd]
 fw1b kernel: [ 6714.394780]  [<c0145fc0>] autoremove_wake_function+0x0/0x40
 fw1b kernel: [ 6714.395354]  [<c8ace6a0>] kjournald+0x0/0x200 [jbd]
 fw1b kernel: [ 6714.395907]  [<c0145d02>] kthread+0x42/0x70
 fw1b kernel: [ 6714.396440]  [<c0145cc0>] kthread+0x0/0x70
 fw1b kernel: [ 6714.396994]  [<c010900b>] kernel_thread_helper+0x7/0x10
 fw1b kernel: [ 6714.397580]  =======================
 fw1b kernel: [ 6714.398150] Code: ff f3 90 8b 03 a9 00 00 20 00 0f 84 1f f5 
ff ff eb ef 0f 0b eb fe f3 90 8b 03 a9 00 00 20 00 0f 84 af f3 ff ff eb ef 0f 
0b eb fe <0f> 0b eb fe 0f 0b eb fe 56 53 89 d3 8d 34 90 eb 16 8d b4 26 00
 fw1b kernel: [ 6714.399468] EIP: [<c8acb958>] 
journal_commit_transaction+0xd88/0xd90 [jbd] SS:ESP 0068:c79adf2c

At first glance this oops seems to unrelated because it happens within 
kjournald. But is triggered by the conntrackd -c command, so I suspect 
(rather naively) that conntrackd calls some kernel function which mixes up 
some kenel memory (stack?) causing a crash later on.

Does anybody have a hint what could be wrong with my setup?

Best regards,
-Rainer