KERNEL: assertion (!atomic_read(&sk->sk_wmem_alloc)) failed at net/ipv4/af_inet.c (146)

Mike Allport <mwallport@xxxxxxxxx> · Wed, 18 Nov 2009 18:10:42 -0800

This email outlines some kernel panics (oops) we've been getting and
would like to resolve.

My question is, is there a patch or set of patches to the 2.6.14
kernel known to resolve the oops shown in 1) and 2)?

Unfotunately, I don't have the luxury of picking up a more up-rev'ed kernel.

Background of our problem:
----------------------------------------
At the 2.6.14-7 kernel, we were getting the oops shown in 1) and 2)
frequently when running SCTP traffic to our server.

In an effort resolve the oops shown in 1) and 2), we ported over only
the SCTP parts of the official Linux patches up to the 2.6.23 release
found at ftp.kernel.org.  The patching seems to have resolved the oops
in 1) and 2), but introduced another set of oops which don't happen
'often' and are shown in 3) and 4) below.

1)  This is the oops gotten at the 2.6.14-7 kernel that is not patched
in the networking or SCTP areas.  I have found some google hits on the
string at the bottom of this oops "KERNEL:
 assertion (!atomic_read(&sk->sk_wmem_alloc)) failed at
net/ipv4/af_inet.c (146)", but no resolutions offered.

atcafs-n0s11:~# Oops: 0000 [#1]
@SMP
@LTT NESTING LEVEL : 0
@Modules linked in: sctp ip_queue iptable_filter ip_tables bonding
loop ohci_hcd i2c_i801 i2c_core ehci_hcd ipmi_watchdog ipmi_si
ipmi_devintf ipmi_msghandler softdog video thermal processor fan
button battery ac
@CPU:    1
@EIP:    0060:[<f89f2e6f>]    Not tainted VLI
@EFLAGS: 00010282   (2.6.14.7-selinux1-WR1.4aq_cgl)
@EIP is at sctp_getsockopt_sctp_status+0x100/0x1de [sctp]
@eax: 00000000   ebx: 000000b0   ecx: 00000000   edx: 00000000
@esi: d5e02000   edi: d64f1640   ebp: d93f2e78   esp: d93f2dac
@ds: 007b   es: 007b   ss: 0068
@Process upis (pid: 4388, threadinfo=d93f2000 task=d8d708d0)
@Stack: 00000000 00000000 87163ef0 d64f1640 00000000 00000001 0000ffff 00000000
@       00200020 00000000 36fc0002 00000000 00000000 00000000 00000000 00000000
@       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
@Call Trace:
@ [<c0103fad>] show_stack+0x7a/0x90
@ [<c010412b>] show_registers+0x14f/0x1c7
@ [<c010516b>] die+0x11a/0x195
@ [<c042b54b>] do_page_fault+0xa77/0x360d
@ [<c0103cdb>] error_code+0x4f/0x54
@ [<f89f43f5>] sctp_getsockopt+0x1ef/0x2a5 [sctp]
@ [<c03aabb7>] sock_common_getsockopt+0x22/0x2c
@ [<c03a7f6b>] sys_getsockopt+0x49/0x82
@ [<c03a8e22>] sys_socketcall+0xa5a/0xa9b
@ [<c04239c4>] no_syscall_entry_trace+0xb/0xf
@Code: ff ff ff 0f b7 86 9e 00 00 00 66 89 85 54 ff ff ff 0f b7 86 9c
00 00 00 66 89 85 56 ff ff ff 8b 86 8c 13 00 00 89 85 58 ff ff
 ff <8b> 42 30 31 d2
85 c0 74 03 8b 50 7c 8b b5 38 ff ff ff 89 95 5c
@ idr_remove called for id=700 which is not allocated.
@ [<c0103fda>] dump_stack+0x17/0x19
@ [<c029a158>] idr_remove_warning+0x1b/0x1d
@ [<c029a241>] sub_remove+0xe7/0xe9
@ [<c029a266>] idr_remove+0x23/0x87
@ [<f89e8be1>] sctp_association_destroy+0x64/0xa3 [sctp]
@ [<f89e9101>] sctp_association_put+0x19/0x1b [sctp]
@ [<f89e9377>] sctp_assoc_bh_rcv+0xd1/0x105 [sctp]
@ [<f89ed9ce>] sctp_inq_push+0x18/0x1a [sctp]
@ [<f89f6660>] sctp_backlog_rcv+0x11/0x15 [sctp]
@ [<c03aa40b>] __release_sock+0x47/0x6a
@ [<c03aaac8>] release_sock+0x55/0x90
@ [<f89f170d>] sctp_close+0xa6/0x111 [sctp]
@ [<c03efd50>] inet_release+0x37/0x5b
@ [<c03a4ac7>] sock_release+0x4c/0x9f
@ [<c03a6a54>] sock_close+0x21/0x3d
@ [<c017a4cf>] __fput+0x147/0x172
@ [<c017a386>] fput+0x19/0x1b
@ [<c01731fd>] filp_close+0x3c/0x75
@ [<c0173589>] sys_close+0x353/0x7a9
@ [<c04239c4>] no_syscall_entry_trace+0xb/0xf
@KERNEL: assertion (!atomic_read(&sk->sk_wmem_alloc)) failed at
net/ipv4/af_inet.c (146)

2)  Below is another oops flaver we've seen logged to the serial port
while running the 2.6.14 kernel (not patched in the network or sctp
areas)

atcafs-n0s6:~# Oops: 0000 [#1]
SMP
LTT NESTING LEVEL : 0
Modules linked in: sctp ip_queue iptable_filter ip_tables bonding loop
ohci_hcd i2c_i801 i2c_core ehci_hcd ipmi_watchdog ipmi_si ipmi_devintf
ipmi_msghandler softdog video thermal processor fan button battery ac
CPU:    1
EIP:    0060:[<f89f2e6f>]    Not tainted VLI
EFLAGS: 00010282   (2.6.14.7-selinux1-WR1.4aq_cgl)
EIP is at sctp_getsockopt_sctp_status+0x100/0x1de [sctp]
eax: 00000000   ebx: 000000b0   ecx: 00000000   edx: 00000000
esi: d8a7c000   edi: d9094940   ebp: d8cbae78   esp: d8cbadac
ds: 007b   es: 007b   ss: 0068
Process upis (pid: 4177, threadinfo=d8cba000 task=d8d53830)
Stack: 00000000 00000000 87180ef0 d9094940 00000000 00000001 0000ffff 00000000
       00200020 00000000 72ed0002 00000000 00000000 00000000 00000000 00000000
       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
 [<c0103fad>] show_stack+0x7a/0x90
 [<c010412b>] show_registers+0x14f/0x1c7
 [<c010516b>] die+0x11a/0x195
 [<c042b54b>] do_page_fault+0xa77/0x360d
 [<c0103cdb>] error_code+0x4f/0x54
 [<f89f43f5>] sctp_getsockopt+0x1ef/0x2a5 [sctp]
 [<c03aabb7>] sock_common_getsockopt+0x22/0x2c
 [<c03a7f6b>] sys_getsockopt+0x49/0x82
 [<c03a8e22>] sys_socketcall+0xa5a/0xa9b
 [<c04239c4>] no_syscall_entry_trace+0xb/0xf
Code: ff ff ff 0f b7 86 9e 00 00 00 66 89 85 54 ff ff ff 0f b7 86 9c
00 00 00 66 89 85 56 ff ff ff 8b 86 8c 13 00 00 89 85 58 ff ff
ff <8b> 42 30 31 d2 85 c0 74
03 8b 50 7c 8b b5 38 ff ff ff 89 95 5c

atcafs-n0s6:~#

3) after porting SCTP parts of the ftp.kernel.org official patches up
to the 2.6.23 relase to our kernel, we now get these oops...

The follwing oops did not lock up the computer and did stop the
computer from accepting SCTP associations (every association attempt
from a client was answered with an ABORT).

atcafs-n0s5:~# Oops: 0000 [#1]
SMP
LTT NESTING LEVEL : 0
Modules linked in: sctp ip_queue iptable_filter ip_tables bonding loop ohci_hcdc
CPU:    1
EIP:    0060:[<f89f8c8c>]    Not tainted VLI
EFLAGS: 00010246   (2.6.14.7-selinux1-WR1.4aq_cgl)
EIP is at sctp_getsockopt_sctp_status+0xe8/0x1f7 [sctp]
eax: 00000000   ebx: 00000000   ecx: 00000000   edx: 00000000
esi: d81d0000   edi: d70cb700   ebp: d896fe78   esp: d896fdac
ds: 007b   es: 007b   ss: 0068
Process upis (pid: 22955, threadinfo=d896f000 task=d8cab5b0)
Stack: 00000000 870f3ed0 000000b0 d70cb700 00000000 00000001 0000ffff 00000000
       00200020 00000000 60450002 00000000 00000000 00000000 00000000 00000000
       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
[<c0103fad>] show_stack+0x7a/0x90
[<c010412b>] show_registers+0x14f/0x1c7
[<c010516b>] die+0x11a/0x195
[<c042b54b>] do_page_fault+0xa77/0x360d
[<c0103cdb>] error_code+0x4f/0x54
[<f89fa5af>] sctp_getsockopt+0x1ef/0x322 [sctp]
[<c03aabb7>] sock_common_getsockopt+0x22/0x2c
[<c03a7f6b>] sys_getsockopt+0x49/0x82
[<c03a8e22>] sys_socketcall+0xa5a/0xa9b
[<c04239c4>] no_syscall_entry_trace+0xb/0xf
Code: ff ff ff 0f b7 86 9a 00 00 00 66 89 85 54 ff ff ff 0f b7 86 98 00 00 00 6

4) Then, after the above oops happened, issue the 'cat
/proc/net/sctp/assocs' on this same computer, now the computer will
lock up after dumping the following oops to the serial port.

atcafs-n0s5:~# cat /proc/net/sctp/assocs

...then the lock up...

ASSOC     SOCK <1>Unable to handle kernel NULL pointer dereference  STY SST STc
printing eip:
T ASSOC-ID TX_QUf89ec250
*pde = 00000000
EUE RX_QUEUE UIDOops: 0000 [#2]
SMP
LTT NESTING LEVEL : 0
Modules linked in: sctp ip_queue iptable_filter ip_tables bonding loop ohci_hcdc
CPU:    3
INODE LPORT RPOEIP:    0060:[<f89ec250>]    Not tainted VLI
EFLAGS: 00010206   (2.6.14.7-selinux1-WR1.4aq_cgl)
RT LADDRS <-> RAEIP is at sctp_v4_cmp_addr+0x3/0x2f [sctp]
eax: d7824c10   ebx: d7824c00   ecx: d7824c10   edx: 0000005c
DDRS
d8c36000 desi: f8a0a680   edi: d7824c10   ebp: d810bdb0   esp: d810bd88
ds: 007b   es: 007b   ss: 0068
Process cat (pid: 15602, threadinfo=d810b000 task=d6d0b450)
Stack: d810bdb0 f89fd7ea d78f61ae d6e9ea00 d81d0064 0000005c d6e9ea00 d70cb700
       00a7f18d d81d0000 d810be0c f89fdc58 d6e9ea00 f8a002e7 d81d0000 d70cb700
       00000002 00000001 00000001 0000f425 00000000 00000000 00000000 00000000
Call Trace:
[<c0103fad>] show_stack+0x7a/0x908d83700 2   10
[<c010412b>] show_registers+0x14f/0x1c7
[<c010516b>] die+0x11a/0x195
[<c042b54b>] do_page_fault+0xa77/0x360d
[<c0103cdb>] error_code+0x4f/0x54
[<f89fdc58>] 1  6499 1523    sctp_assocs_seq_show+0xf5/0x146 [sctp]
[<c019a8a2>] seq_read+0x1f8/0x28e    0      229
[<c0175187>] vfs_read+0xc4/0x169     0 96474 140
[<c0175812>] sys_read+0x371/0x132c
[<c04239c4>] no_syscall_entry_trace+0xb/0xf
Code: 00 08 8b 40 04 89 e5 5d 89 42 04 b8 08 00 00 00 c3 55 66 c7 00 02 00 66 8
01 43211  *10.6.<0>Kernel panic - not syncing: Fatal exception in interrupt
48.5 <-> *62.11.

Thanks,
  Mike Allport
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html