On 12/30/2012 3:23 AM, Joshua Kinard wrote: > > Here's an untainted oops from IP32. Triggered by logging in over SSH on > IPv6 and running 'dmesg': > > Unhandled kernel unaligned access[#1]: > Cpu 0 > $ 0 : 0000000000000000 0000000000000010 0000000000000000 bfffff005e17aac4 > $ 4 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > $ 8 : 980000005e00e000 0000000000000000 980000005e00e000 0000000000000410 > $12 : ffffffff9001fce1 000000001000001e fffffffffffff000 000000000000001f > $16 : 980000005e03fa40 ffffffffde0300b8 ffffff0000000000 0000000000000034 > $20 : 00000000006532d8 0000000000000594 00000000004a1134 00000000004a0000 > $24 : 0000000000000001 00000000000003f0 > $28 : 980000005e03c000 980000005e03fa10 0000000000000000 ffffffff800059a0 > Hi : 000000000011a02a > Lo : 000000000005e00e > epc : ffffffff8000b700 do_ade+0x1b0/0x480 > Not tainted > ra : ffffffff800059a0 ret_from_exception+0x0/0x24 > Status: 9001fce3 KX SX UX KERNEL EXL IE > Cause : 00000010 > BadVA : bfffff005e17aac4 > PrId : 00002733 (RM7000) > Process sshd (pid: 1323, threadinfo=980000005e03c000, task=980000005fe76000, > tls=0000000077010490) > Stack : 980000005e00e6a0 980000005e17aa0c 980000005faef000 0000000000000594 > 0000000000000034 ffffffff800059a0 0000000000000000 0000000000000010 > 00000000000000d0 0000000000000000 980000005faef000 00000000000008a0 > 0000000000000000 0000000000000000 980000005e00e000 0000000000000000 > 980000005e00e000 0000000000000410 0000000000000020 ffffffff80223b6c > fffffffffffff000 000000000000001f 980000005e17aa0c 980000005faef000 > 0000000000000594 0000000000000034 00000000006532d8 0000000000000594 > 00000000004a1134 00000000004a0000 0000000000000001 00000000000003f0 > 0000000000000014 ffffffff802de0d0 980000005e03c000 980000005e03fb70 > 0000000000000000 ffffffff80334ef8 ffffffff9001fce3 000000000011a02a > ... > Call Trace: > [<ffffffff8000b700>] do_ade+0x1b0/0x480 > [<ffffffff800059a0>] ret_from_exception+0x0/0x24 > [<ffffffff80334f24>] sk_stream_alloc_skb+0x6c/0x118 > [<ffffffff80335e8c>] tcp_sendmsg+0x6fc/0xe90 > [<ffffffff802d3744>] sock_aio_write+0x10c/0x150 > [<ffffffff800b48c4>] do_sync_write+0x9c/0x108 > [<ffffffff800b4a98>] vfs_write+0x168/0x180 > [<ffffffff800b4bbc>] SyS_write+0x54/0xb8 > [<ffffffff80013538>] handle_sys+0x118/0x13c > > > Code: 00441024 5440ffe6 de030100 <68730000> 6c730007 24030000 14600040 > 00000000 8e020124 > ---[ end trace 8127ff095caa30f9 ]--- > > > Turns out it is non-fatal. The serial console is still alive, but sshd was > terminated as a result (it's in the 'Ds' state under ps ux output). Some quick digging via objdump and a new oops, from a rebuilt kernel including full debugging, points at an inlined call to skb_reserve from within sk_stream_alloc_skb in net/ipv4/tcp.c. Bottom of new oops: Call Trace: [<ffffffff8000b710>] do_ade+0x1b0/0x480 [<ffffffff800059a0>] ret_from_exception+0x0/0x24 [<ffffffff803352dc>] sk_stream_alloc_skb+0x6c/0x118 [<ffffffff8033624c>] tcp_sendmsg+0x6fc/0xe98 [<ffffffff802d3c44>] sock_aio_write+0x10c/0x150 [<ffffffff800b5cd4>] do_sync_write+0x9c/0x108 [<ffffffff800b5ea8>] vfs_write+0x168/0x180 [<ffffffff800b5fcc>] SyS_write+0x54/0xb8 [<ffffffff80013558>] handle_sys+0x118/0x13c Disassembly of vmlinux, and match of address ffffffff803352dc yields this: if (sk_wmem_schedule(sk, skb->truesize)) { skb_reserve(skb, sk->sk_prot->max_header); ffffffff803352d8: 8c420108 lw v0,264(v0) * Increase the headroom of an empty &sk_buff by reducing the tail * room. This is only allowed for an empty buffer. */ static inline void skb_reserve(struct sk_buff *skb, int len) { skb->data += len; ffffffff803352dc: de0300b8 ld v1,184(s0) skb->tail += len; ffffffff803352e0: 8e0400a8 lw a0,168(s0) * Increase the headroom of an empty &sk_buff by reducing the tail * room. This is only allowed for an empty buffer. */ I looked around at several files in git, mainly, net/ipv4/tcp.c, and none of the recent changes to 3.7 sticks out immediately as the cause. I'll either have to use git bisect or run kgdb on it to figure anything else out. Does this look like a case of scheduling while atomic? There's a fix in davem's -next tree that addresses such a cause, but I haven't tried that just yet to see if it's the same issue. -- Joshua Kinard Gentoo/MIPS kumba@xxxxxxxxxx 4096R/D25D95E3 2011-03-28 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic
Attachment:
signature.asc
Description: OpenPGP digital signature