Re: Unhandled kernel unaligned access on IP32 w/ network I/O && 3.7.1?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/30/2012 3:23 AM, Joshua Kinard wrote:
> 
> Here's an untainted oops from IP32.  Triggered by logging in over SSH on
> IPv6 and running 'dmesg':
> 
> Unhandled kernel unaligned access[#1]:
> Cpu 0
> $ 0   : 0000000000000000 0000000000000010 0000000000000000 bfffff005e17aac4
> $ 4   : 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> $ 8   : 980000005e00e000 0000000000000000 980000005e00e000 0000000000000410
> $12   : ffffffff9001fce1 000000001000001e fffffffffffff000 000000000000001f
> $16   : 980000005e03fa40 ffffffffde0300b8 ffffff0000000000 0000000000000034
> $20   : 00000000006532d8 0000000000000594 00000000004a1134 00000000004a0000
> $24   : 0000000000000001 00000000000003f0
> $28   : 980000005e03c000 980000005e03fa10 0000000000000000 ffffffff800059a0
> Hi    : 000000000011a02a
> Lo    : 000000000005e00e
> epc   : ffffffff8000b700 do_ade+0x1b0/0x480
>     Not tainted
> ra    : ffffffff800059a0 ret_from_exception+0x0/0x24
> Status: 9001fce3    KX SX UX KERNEL EXL IE
> Cause : 00000010
> BadVA : bfffff005e17aac4
> PrId  : 00002733 (RM7000)
> Process sshd (pid: 1323, threadinfo=980000005e03c000, task=980000005fe76000,
> tls=0000000077010490)
> Stack : 980000005e00e6a0 980000005e17aa0c 980000005faef000 0000000000000594
>         0000000000000034 ffffffff800059a0 0000000000000000 0000000000000010
>         00000000000000d0 0000000000000000 980000005faef000 00000000000008a0
>         0000000000000000 0000000000000000 980000005e00e000 0000000000000000
>         980000005e00e000 0000000000000410 0000000000000020 ffffffff80223b6c
>         fffffffffffff000 000000000000001f 980000005e17aa0c 980000005faef000
>         0000000000000594 0000000000000034 00000000006532d8 0000000000000594
>         00000000004a1134 00000000004a0000 0000000000000001 00000000000003f0
>         0000000000000014 ffffffff802de0d0 980000005e03c000 980000005e03fb70
>         0000000000000000 ffffffff80334ef8 ffffffff9001fce3 000000000011a02a
>         ...
> Call Trace:
> [<ffffffff8000b700>] do_ade+0x1b0/0x480
> [<ffffffff800059a0>] ret_from_exception+0x0/0x24
> [<ffffffff80334f24>] sk_stream_alloc_skb+0x6c/0x118
> [<ffffffff80335e8c>] tcp_sendmsg+0x6fc/0xe90
> [<ffffffff802d3744>] sock_aio_write+0x10c/0x150
> [<ffffffff800b48c4>] do_sync_write+0x9c/0x108
> [<ffffffff800b4a98>] vfs_write+0x168/0x180
> [<ffffffff800b4bbc>] SyS_write+0x54/0xb8
> [<ffffffff80013538>] handle_sys+0x118/0x13c
> 
> 
> Code: 00441024  5440ffe6  de030100 <68730000> 6c730007  24030000  14600040
> 00000000  8e020124
> ---[ end trace 8127ff095caa30f9 ]---
> 
> 
> Turns out it is non-fatal.  The serial console is still alive, but sshd was
> terminated as a result (it's in the 'Ds' state under ps ux output).

Some quick digging via objdump and a new oops, from a rebuilt kernel
including full debugging, points at an inlined call to skb_reserve from
within sk_stream_alloc_skb in net/ipv4/tcp.c.


Bottom of new oops:
Call Trace:
[<ffffffff8000b710>] do_ade+0x1b0/0x480
[<ffffffff800059a0>] ret_from_exception+0x0/0x24
[<ffffffff803352dc>] sk_stream_alloc_skb+0x6c/0x118
[<ffffffff8033624c>] tcp_sendmsg+0x6fc/0xe98
[<ffffffff802d3c44>] sock_aio_write+0x10c/0x150
[<ffffffff800b5cd4>] do_sync_write+0x9c/0x108
[<ffffffff800b5ea8>] vfs_write+0x168/0x180
[<ffffffff800b5fcc>] SyS_write+0x54/0xb8
[<ffffffff80013558>] handle_sys+0x118/0x13c

Disassembly of vmlinux, and match of address ffffffff803352dc yields this:
                if (sk_wmem_schedule(sk, skb->truesize)) {
                        skb_reserve(skb, sk->sk_prot->max_header);
ffffffff803352d8:       8c420108        lw      v0,264(v0)
 *      Increase the headroom of an empty &sk_buff by reducing the tail
 *      room. This is only allowed for an empty buffer.
 */
static inline void skb_reserve(struct sk_buff *skb, int len)
{
        skb->data += len;
ffffffff803352dc:       de0300b8        ld      v1,184(s0)
        skb->tail += len;
ffffffff803352e0:       8e0400a8        lw      a0,168(s0)
 *      Increase the headroom of an empty &sk_buff by reducing the tail
 *      room. This is only allowed for an empty buffer.
 */


I looked around at several files in git, mainly, net/ipv4/tcp.c, and none of
the recent changes to 3.7 sticks out immediately as the cause.  I'll either
have to use git bisect or run kgdb on it to figure anything else out.

Does this look like a case of scheduling while atomic?  There's a fix in
davem's -next tree that addresses such a cause, but I haven't tried that
just yet to see if it's the same issue.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@xxxxxxxxxx
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux