Re: [sparc] ext3 corruption on latest mainline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > > > > > I ended up with corrupted ext3 (see sparc.jpg) and working now to restore it.
> > > > > 
> > > > > Up and running again. I managed to get some more info from yesterday crash from syslog:
> > > > 
> > > > I've seen similar corruptions with an Intel SSD drive on an MPT Fusion
> > > > SAS interface, which also goes through scsi like your sym case here.
> > > > 
> > > > But I'm pretty sure I also saw them with 2.6.28, were you able to run
> > > > 2.6.28 cleanly?
> > > 
> > > The thing is the last time I run this sparc was somewhere in 2.6.27~2.6.28 window.
> > > I didn't see anything like it at that time. I'll give a try to 2.6.28 but it might
> > > be hard to trigger it as I don't know what exactly caused it. Also yesterday
> > > the very same kernel (f3b8436a) worked under I/O load for hours just fine.
> > > I'll leave it with 2.6.28 on it busy for a few days and see what happens.
> > 
> > This is vanilla 2.6.28 and it shows memory corruption problems as well.
> > After a few hours I got these:
> > 
> > (see my comments at the end of mail)
> > 
> > Jan 22 15:49:27 localhost kernel: =============================================================================
> > Jan 22 15:49:27 localhost kernel: BUG tsb_16KB: Object padding overwritten
> > Jan 22 15:49:27 localhost kernel: -----------------------------------------------------------------------------
> > Jan 22 15:49:27 localhost kernel: 
> > Jan 22 15:49:27 localhost kernel: INFO: 0xfffff800bdb7fcc0-0xfffff800bdb7fcff. First byte 0x0 instead of 0x5a
> > Jan 22 15:49:27 localhost kernel: INFO: Allocated in tsb_grow+0x88/0x440 age=19349783 cpu=0 pid=3212
> > Jan 22 15:49:27 localhost kernel: INFO: Freed in tsb_grow+0x2f0/0x440 age=19810433 cpu=0 pid=2823
> > Jan 22 15:49:27 localhost kernel: INFO: Slab 0x0000000202392680 objects=1 used=1 fp=0x0000000000000000 flags=0x2083
> > Jan 22 15:49:27 localhost kernel: INFO: Object 0xfffff800bdb78000 @offset=0 fp=0x0000000000000000
> > Jan 22 15:49:27 localhost kernel: 
> > Jan 22 15:49:27 localhost kernel:   Object 0xfffff800bdb78000:  00 00 40 00 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b ..@.....kkkkkkkk
> [...]
> > Jan 22 15:49:28 localhost kernel:  Padding 0xfffff800bdb7fff0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > Jan 22 15:49:28 localhost kernel: Call Trace:
> > Jan 22 15:49:28 localhost kernel:  [00000000004bde04] print_trailer+0xc4/0x160
> > Jan 22 15:49:28 localhost kernel:  [00000000004be4a8] check_bytes_and_report+0xa8/0xe0
> > Jan 22 15:49:28 localhost kernel:  [00000000004be6e0] check_object+0x200/0x260
> > Jan 22 15:49:28 localhost kernel:  [00000000004bee0c] __slab_free+0x2ac/0x440
> > Jan 22 15:49:28 localhost kernel:  [00000000004c1ce8] kmem_cache_free+0x88/0xe0
> > Jan 22 15:49:28 localhost kernel:  [00000000004448e0] destroy_context+0x20/0xa0
> > Jan 22 15:49:28 localhost kernel:  [000000000045a260] __mmdrop+0x80/0xc0
> > Jan 22 15:49:28 localhost kernel:  [00000000004552e8] finish_task_switch+0x88/0xc0
> > Jan 22 15:49:28 localhost kernel:  [00000000006bf874] switch_to_pc+0x88/0x4d4
> > Jan 22 15:49:28 localhost kernel:  [00000000006c11a4] schedule_hrtimeout_range+0xc4/0xe0
> > Jan 22 15:49:28 localhost kernel:  [00000000004d3338] do_select+0x3b8/0x4c0
> > Jan 22 15:49:28 localhost kernel:  [00000000004fb300] compat_core_sys_select+0x160/0x200
> > Jan 22 15:49:28 localhost kernel:  [00000000004fce60] compat_sys_select+0x20/0x100
> > Jan 22 15:49:28 localhost kernel:  [0000000000406254] linux_sparc_syscall32+0x34/0x40
> > Jan 22 15:49:28 localhost kernel: FIX tsb_16KB: Restoring 0xfffff800bdb7fcc0-0xfffff800bdb7fcff=0x5a
> > Jan 22 15:49:28 localhost kernel: 
> > Jan 22 15:55:40 localhost init: Id "s0" respawning too fast: disabled for 5 minutes
> > Jan 22 16:02:21 localhost init: Id "s0" respawning too fast: disabled for 5 minutes
> > Jan 22 16:09:02 localhost init: Id "s0" respawning too fast: disabled for 5 minutes
> > Jan 22 16:15:43 localhost init: Id "s0" respawning too fast: disabled for 5 minutes
> > Jan 22 16:22:24 localhost init: Id "s0" respawning too fast: disabled for 5 minutes
> > Jan 22 16:29:05 localhost init: Id "s0" respawning too fast: disabled for 5 minutes
> > Jan 22 16:35:46 localhost init: Id "s0" respawning too fast: disabled for 5 minutes
> > Jan 22 16:42:27 localhost init: Id "s0" respawning too fast: disabled for 5 minutes
> > Jan 22 16:45:40 localhost rc-scripts: WARNING:  sshd has already been started.
> > Jan 22 16:48:50 localhost kernel: =============================================================================
> > Jan 22 16:48:50 localhost kernel: BUG kmalloc-64: Redzone overwritten
> > Jan 22 16:48:50 localhost kernel: -----------------------------------------------------------------------------
> > Jan 22 16:48:50 localhost kernel: 
> > Jan 22 16:48:50 localhost kernel: INFO: 0xfffff800bdd526a0-0xfffff800bdd526a7. First byte 0x0 instead of 0xbb
> > Jan 22 16:48:50 localhost kernel: INFO: Freed in free_rb_tree_fname+0x48/0xc0 age=104208 cpu=0 pid=5636
> > Jan 22 16:48:50 localhost kernel: INFO: Slab 0x0000000202397f60 objects=60 used=59 fp=0xfffff800bdd52660 flags=0x00c3
> > Jan 22 16:48:50 localhost kernel: INFO: Object 0xfffff800bdd52660 @offset=1632 fp=0x0000000000000000
> > Jan 22 16:48:50 localhost kernel: 
> > Jan 22 16:48:50 localhost kernel: Bytes b4 0xfffff800bdd52650:  00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
> > Jan 22 16:48:50 localhost kernel:   Object 0xfffff800bdd52660:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > Jan 22 16:48:50 localhost kernel:   Object 0xfffff800bdd52670:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > Jan 22 16:48:50 localhost kernel:   Object 0xfffff800bdd52680:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > Jan 22 16:48:50 localhost kernel:   Object 0xfffff800bdd52690:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > Jan 22 16:48:50 localhost kernel:  Redzone 0xfffff800bdd526a0:  00 00 00 00 00 00 00 00                         ........        
> > Jan 22 16:48:50 localhost kernel:  Padding 0xfffff800bdd526e0:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
> > Jan 22 16:48:50 localhost kernel: Call Trace:
> > Jan 22 16:48:50 localhost kernel:  [00000000004bde04] print_trailer+0xc4/0x160
> > Jan 22 16:48:50 localhost kernel:  [00000000004be4a8] check_bytes_and_report+0xa8/0xe0
> > Jan 22 16:48:50 localhost kernel:  [00000000004be528] check_object+0x48/0x260
> > Jan 22 16:48:50 localhost kernel:  [00000000004bfadc] __slab_alloc+0x59c/0x720
> > Jan 22 16:48:50 localhost kernel:  [00000000004bfef4] __kmalloc+0xf4/0x120
> > Jan 22 16:48:50 localhost kernel:  [0000000000519258] ext3_htree_store_dirent+0x18/0x160
> > Jan 22 16:48:50 localhost kernel:  [00000000005232d0] htree_dirblock_to_tree+0x170/0x1e0
> > Jan 22 16:48:50 localhost kernel:  [0000000000523394] ext3_htree_fill_tree+0x54/0x260
> > Jan 22 16:48:50 localhost kernel:  [0000000000519900] ext3_readdir+0x560/0x660
> > Jan 22 16:48:50 localhost kernel:  [00000000004d22f8] vfs_readdir+0x78/0xc0
> > Jan 22 16:48:50 localhost kernel:  [00000000004d2370] sys_getdents64+0x30/0xa0
> > Jan 22 16:48:50 localhost kernel:  [0000000000406254] linux_sparc_syscall32+0x34/0x40
> > Jan 22 16:48:50 localhost kernel: FIX kmalloc-64: Restoring 0xfffff800bdd526a0-0xfffff800bdd526a7=0xbb
> > Jan 22 16:48:50 localhost kernel: 
> > Jan 22 16:48:50 localhost kernel: FIX kmalloc-64: Marking all objects used
> > Jan 22 16:49:09 localhost init: Id "s0" respawning too fast: disabled for 5 minutes
> > Jan 22 16:49:14 localhost kernel: SysRq : Emergency Sync
> > Jan 22 16:49:14 localhost kernel: Emergency Sync complete
> > 
> > I left the box doing some compilation etc. Somewhere around 16:45 I couldn't log into it via ssh so I logged into
> > the box on a few terminals, started typing something and the terminals started freezing one after another. At the
> > end the buzzer was making constant sound and then the keyboard stopped working - at that point I considered the
> > box dead so I restared it manually. I'll push it some more to see if/what else pops out.
> 
> Another one:
> 
> =============================================================================
> BUG nfs_inode_cache: Redzone overwritten
> -----------------------------------------------------------------------------
> 
> INFO: 0xfffff800affe7b78-0xfffff800affe7b7f. First byte 0x30 instead of 0xcc
> INFO: Allocated in nfs_alloc_inode+0x10/0x40 age=11324173 cpu=0 pid=4471
> INFO: Slab 0x00000002020ffa00 objects=22 used=22 fp=0x0000000000000000 flags=0x2083
> INFO: Object 0xfffff800affe7620 @offset=30240 fp=0x0000000000000000
> 
> Bytes b4 0xfffff800affe7610:  00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
>   Object 0xfffff800affe7620:  00 00 00 00 01 f4 c4 89 00 24 01 00 07 01 9a 44 .....ôÄ..$.....D
>   Object 0xfffff800affe7630:  6b 01 00 00 00 00 58 c4 5a 3f 85 9e 41 3f ae b8 k.....XÄZ?..A?®¸
>   Object 0xfffff800affe7640:  1d 46 52 42 55 be 89 c4 f4 01 b8 cd d9 ad 5a 5a .FRBU¾.Äô.¸ÍÙ­ZZ
>   Object 0xfffff800affe7650:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>   Object 0xfffff800affe7660:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>   Object 0xfffff800affe7670:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>   Object 0xfffff800affe7680:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>   Object 0xfffff800affe7690:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>   Object 0xfffff800affe76a0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>   Object 0xfffff800affe76b0:  00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 02 ................
>   Object 0xfffff800affe76c0:  00 00 00 01 00 02 8a bd 00 00 00 00 00 00 75 30 .......½......u0
>   Object 0xfffff800affe76d0:  00 00 00 01 00 02 88 b7 00 00 00 00 00 00 00 01 .......·........
>   Object 0xfffff800affe76e0:  00 00 00 00 00 00 36 b8 5a 5a 5a 5a 5a 5a 5a 5a ......6¸ZZZZZZZZ
>   Object 0xfffff800affe76f0:  00 00 00 00 00 00 00 00 ff ff f8 00 af fe 76 f8 ........ÿÿø.¯þvø
>   Object 0xfffff800affe7700:  ff ff f8 00 af fe 76 f8 ff ff f8 00 af fe 77 08 ÿÿø.¯þvøÿÿø.¯þw.
>   Object 0xfffff800affe7710:  ff ff f8 00 af fe 77 08 00 00 00 00 00 00 00 00 ÿÿø.¯þw.........
>   Object 0xfffff800affe7720:  00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 ................
>   Object 0xfffff800affe7730:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>   Object 0xfffff800affe7740:  ff ff f8 00 af fe 77 40 ff ff f8 00 af fe 77 40 ÿÿø.¯þw@ÿÿø.¯þw@
>   Object 0xfffff800affe7750:  00 00 00 01 5a 5a 5a 5a 00 00 00 00 00 00 00 00 ....ZZZZ........
>   Object 0xfffff800affe7760:  00 5a 5a 5a 5a 5a 5a 5a de ad 4e ad ff ff ff ff .ZZZZZZZÞ­N­ÿÿÿÿ
>   Object 0xfffff800affe7770:  ff ff ff ff ff ff ff ff 00 00 00 00 00 92 a6 30 ÿÿÿÿÿÿÿÿ......¦0
>   Object 0xfffff800affe7780:  00 00 00 00 00 00 00 00 00 00 00 00 00 75 b3 c8 .............u³È
>   Object 0xfffff800affe7790:  ff ff f8 00 af fe 77 90 ff ff f8 00 af fe 77 90 ÿÿø.¯þw.ÿÿø.¯þw.
>   Object 0xfffff800affe77a0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>   Object 0xfffff800affe77b0:  00 00 00 00 00 10 01 00 00 00 00 00 00 20 02 00 ................
>   Object 0xfffff800affe77c0:  ff ff f8 00 af fe 77 c0 ff ff f8 00 af fe 77 c0 ÿÿø.¯þwÀÿÿø.¯þwÀ
>   Object 0xfffff800affe77d0:  ff ff f8 00 af fe 77 d0 ff ff f8 00 af fe 77 d0 ÿÿø.¯þwÐÿÿø.¯þwÐ
>   Object 0xfffff800affe77e0:  00 00 00 00 01 f4 c4 89 00 00 00 00 00 00 00 02 .....ôÄ.........
>   Object 0xfffff800affe77f0:  00 00 03 e8 00 00 03 e8 00 00 00 00 00 00 00 00 ...è...è........
>   Object 0xfffff800affe7800:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 ................
>   Object 0xfffff800affe7810:  00 00 00 00 48 ea 74 ac 00 00 00 00 00 00 00 00 ....Hêt¬........
>   Object 0xfffff800affe7820:  00 00 00 00 49 76 fe 74 00 00 00 00 00 00 00 00 ....Ivþt........
>   Object 0xfffff800affe7830:  00 00 00 00 49 76 fe 74 00 00 00 00 00 00 00 00 ....Ivþt........
>   Object 0xfffff800affe7840:  00 00 00 12 00 00 00 00 00 00 00 00 00 00 00 08 ................
>   Object 0xfffff800affe7850:  00 00 41 ed 00 00 00 00 00 00 00 00 00 00 00 00 ..Aí............
>   Object 0xfffff800affe7860:  de ad 4e ad ff ff ff ff ff ff ff ff ff ff ff ff Þ­N­ÿÿÿÿÿÿÿÿÿÿÿÿ
>   Object 0xfffff800affe7870:  00 00 00 00 00 7b be 88 00 00 00 00 00 00 00 00 .....{¾.........
>   Object 0xfffff800affe7880:  00 00 00 00 00 76 4b f0 00 00 00 01 00 00 00 00 .....vKð........
>   Object 0xfffff800affe7890:  00 00 00 00 00 00 00 00 de ad 4e ad ff ff ff ff ........Þ­N­ÿÿÿÿ
>   Object 0xfffff800affe78a0:  ff ff ff ff ff ff ff ff 00 00 00 00 00 92 a6 60 ÿÿÿÿÿÿÿÿ......¦`
>   Object 0xfffff800affe78b0:  00 00 00 00 00 00 00 00 00 00 00 00 00 75 b4 c0 .............u´À
>   Object 0xfffff800affe78c0:  ff ff f8 00 af fe 78 c0 ff ff f8 00 af fe 78 c0 ÿÿø.¯þxÀÿÿø.¯þxÀ
>   Object 0xfffff800affe78d0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>   Object 0xfffff800affe78e0:  ff ff f8 00 af fe 78 88 00 00 00 00 00 7b be 98 ÿÿø.¯þx......{¾.
>   Object 0xfffff800affe78f0:  00 00 00 00 00 00 00 00 00 00 00 00 00 76 4c 70 .............vLp
>   Object 0xfffff800affe7900:  00 69 7a 38 00 69 78 d8 00 00 00 00 00 00 00 00 .iz8.ixØ........
>   Object 0xfffff800affe7910:  00 69 78 f8 00 00 00 00 00 5f 38 d0 00 71 dc c0 .ixø....._8Ð.qÜÀ
>   Object 0xfffff800affe7920:  00 69 79 38 00 71 dc e0 00 00 00 00 00 00 00 00 .iy8.qÜà........
>   Object 0xfffff800affe7930:  00 69 79 18 00 00 00 02 00 5f 38 d0 00 69 79 18 .iy......_8Ð.iy.
>   Object 0xfffff800affe7940:  ff ff f8 00 af fe 79 38 00 00 00 00 00 7b be a0 ÿÿø.¯þy8.....{¾.
>   Object 0xfffff800affe7950:  00 00 00 00 00 00 00 00 00 00 00 00 00 76 4c 50 .............vLP
>   Object 0xfffff800affe7960:  00 00 00 00 00 6d 5e 38 00 00 00 00 00 6d 5c a8 .....m^8.....m\¨
>   Object 0xfffff800affe7970:  ff ff f8 00 bd 84 5b 18 00 00 00 00 00 00 00 00 ÿÿø.½.[.........
>   Object 0xfffff800affe7980:  ff ff f8 00 af fe 79 88 ff ff f8 00 af fe 77 a0 ÿÿø.¯þy.ÿÿø.¯þw.
>   Object 0xfffff800affe7990:  00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 ................
>   Object 0xfffff800affe79a0:  00 00 00 00 00 00 00 00 de ad 4e ad ff ff ff ff ........Þ­N­ÿÿÿÿ
>   Object 0xfffff800affe79b0:  ff ff ff ff ff ff ff ff 00 00 00 00 00 e7 1d 70 ÿÿÿÿÿÿÿÿ.....ç.p
>   Object 0xfffff800affe79c0:  00 00 00 00 00 00 00 00 00 00 00 00 00 76 4c d0 .............vLÐ
>   Object 0xfffff800affe79d0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>   Object 0xfffff800affe79e0:  00 01 00 01 00 00 00 00 ff ff f8 00 af fe 79 e8 ........ÿÿø.¯þyè
>   Object 0xfffff800affe79f0:  ff ff f8 00 af fe 79 e8 00 00 00 00 00 00 00 00 ÿÿø.¯þyè........
>   Object 0xfffff800affe7a00:  de ad 4e ad ff ff ff ff ff ff ff ff ff ff ff ff Þ­N­ÿÿÿÿÿÿÿÿÿÿÿÿ
>   Object 0xfffff800affe7a10:  00 00 00 00 00 e7 1d 68 00 00 00 00 00 00 00 00 .....ç.h........
>   Object 0xfffff800affe7a20:  00 00 00 00 00 76 4c f0 00 00 00 00 00 00 00 00 .....vLð........
>   Object 0xfffff800affe7a30:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>   Object 0xfffff800affe7a40:  00 00 00 00 00 e7 1c c8 00 00 00 00 00 12 00 d2 .....ç.È.......Ò
>   Object 0xfffff800affe7a50:  00 00 00 00 00 7b 46 40 00 00 00 00 00 00 00 00 .....{F@........
>   Object 0xfffff800affe7a60:  de ad 4e ad ff ff ff ff ff ff ff ff ff ff ff ff Þ­N­ÿÿÿÿÿÿÿÿÿÿÿÿ
>   Object 0xfffff800affe7a70:  00 00 00 00 00 e7 1d 60 00 00 00 00 00 00 00 00 .....ç.`........
>   Object 0xfffff800affe7a80:  00 00 00 00 00 76 4d 10 ff ff f8 00 af fe 7a 88 .....vM.ÿÿø.¯þz.
>   Object 0xfffff800affe7a90:  ff ff f8 00 af fe 7a 88 00 00 00 00 00 00 00 00 ÿÿø.¯þz.........
>   Object 0xfffff800affe7aa0:  ff ff f8 00 af fe 7a a0 ff ff f8 00 af fe 7a a0 ÿÿø.¯þz.ÿÿø.¯þz.
>   Object 0xfffff800affe7ab0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>   Object 0xfffff800affe7ac0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>   Object 0xfffff800affe7ad0:  ff ff f8 00 af fe 7a d0 ff ff f8 00 af fe 7a d0 ÿÿø.¯þzÐÿÿø.¯þzÐ
>   Object 0xfffff800affe7ae0:  00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
>   Object 0xfffff800affe7af0:  de ad 4e ad ff ff ff ff ff ff ff ff ff ff ff ff Þ­N­ÿÿÿÿÿÿÿÿÿÿÿÿ
>   Object 0xfffff800affe7b00:  00 00 00 00 00 92 a6 60 00 00 00 00 00 00 00 00 ......¦`........
>   Object 0xfffff800affe7b10:  00 00 00 00 00 75 b4 c0 ff ff f8 00 af fe 7b 18 .....u´Àÿÿø.¯þ{.
>   Object 0xfffff800affe7b20:  ff ff f8 00 af fe 7b 18 00 00 00 00 00 00 00 00 ÿÿø.¯þ{.........
>   Object 0xfffff800affe7b30:  00 00 00 00 00 00 00 00 ff ff f8 00 af fe 7a e0 ........ÿÿø.¯þzà
>   Object 0xfffff800affe7b40:  00 00 00 00 00 06 02 00 30 7f fc 00 00 00 c0 00 ........0.ü...À.
>   Object 0xfffff800affe7b50:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 ...............@
>   Object 0xfffff800affe7b60:  00 00 01 0c 00 00 00 09 00 00 00 24 00 00 00 07 ...........$....
>   Object 0xfffff800affe7b70:  00 00 00 00 00 06 02 00                         ........        
>  Redzone 0xfffff800affe7b78:  30 7f fc 00 00 00 40 00                         0.ü...@.        
>  Padding 0xfffff800affe7bb8:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
> Call Trace:
>  [00000000004bde04] print_trailer+0xc4/0x160
>  [00000000004be4a8] check_bytes_and_report+0xa8/0xe0
>  [00000000004be528] check_object+0x48/0x260
>  [00000000004bee0c] __slab_free+0x2ac/0x440
>  [00000000004c1ce8] kmem_cache_free+0x88/0xe0
>  [00000000005456ec] nfs_destroy_inode+0xc/0x20
>  [00000000004d75cc] destroy_inode+0x2c/0x60
>  [00000000004d7cec] dispose_list+0xac/0x100
>  [00000000004d8614] shrink_icache_memory+0x1d4/0x2e0
>  [00000000004a29cc] shrink_slab+0x16c/0x200
>  [00000000004a2df8] kswapd+0x398/0x5a0
>  [000000000047381c] kthread+0x3c/0x80
>  [0000000000427010] kernel_thread+0x30/0x60
>  [0000000000473b10] kthreadd+0x170/0x200
> FIX nfs_inode_cache: Restoring 0xfffff800affe7b78-0xfffff800affe7b7f=0xcc

And ... boom :)

Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 00000000000004b2
tsk->{mm,active_mm}->pgd = fffff8009848e000
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
kswapd0(156): Oops [#1]
TSTATE: 0000009980f09605 TPC: 0000000000480094 TNPC: 0000000000480098 Y: 00000000    Not tainted
TPC: <__lock_acquire+0x34/0xb00>
g0: fffff800bf759090 g1: 000000000000000f g2: 0000000000000001 g3: fffff800bf348000
g4: fffff800bf331b00 g5: fffff800bf6d6000 g6: fffff800bf348000 g7: 0000000000000050
o0: 0000000000002000 o1: 00000000004a0cac o2: fff8000000000000 o3: fffff800bf331b00
o4: 0000000000000000 o5: 000000000099a760 sp: fffff800bf34b061 ret_pc: 00000000004a4564
RPC: <__dec_zone_state+0x4/0xa0>
l0: 0000000000000494 l1: fffff80080002000 l2: 0000000000000000 l3: 0000080000000000
l4: fffff800bf331b00 l5: 00000000007c0c00 l6: 0000000000800000 l7: 00000000007a6000
i0: 00000000000000e8 i1: 0000000000000000 i2: 0000000000000000 i3: 0000000000000000
i4: 0000000000000001 i5: 0000000000000000 i6: fffff800bf34b121 i7: 0000000000481c04
I7: <lock_acquire+0x44/0x60>
Caller[0000000000481c04]: lock_acquire+0x44/0x60
Caller[00000000006c25e4]: _spin_lock+0x24/0x40
Caller[00000000004e5f94]: remove_inode_buffers+0x34/0xc0
Caller[00000000004d863c]: shrink_icache_memory+0x1fc/0x2e0
Caller[00000000004a29cc]: shrink_slab+0x16c/0x200
Caller[00000000004a2df8]: kswapd+0x398/0x5a0
Caller[000000000047381c]: kthread+0x3c/0x80
Caller[0000000000427010]: kernel_thread+0x30/0x60
Caller[0000000000473b10]: kthreadd+0x170/0x200
Instruction DUMP: 01000000  2ace408c  c25e0000 <e65e2008> 22c4c089  c25e0000  f20524b8  80a6602f  184000da 
note: kswapd0[156] exited with preempt_count 1
BUG: sleeping function called from invalid context at /home/mako/linux/lkt/sources/linux-2.6/kernel/nsproxy.c:217
in_atomic(): 1, irqs_disabled(): 0, pid: 156, name: kswapd0
INFO: lockdep is turned off.
Call Trace:
 [0000000000453654] __might_sleep+0xd4/0x120
 [000000000047814c] switch_task_namespaces+0xc/0x60
 [00000000004781a8] exit_task_namespaces+0x8/0x20
 [000000000046010c] do_exit+0x4ec/0x8a0
 [0000000000429190] die_if_kernel+0x150/0x300
 [0000000000445030] unhandled_fault+0x70/0xe0
 [00000000004452f8] do_sparc64_fault+0x1d8/0x5c0
 [000000000040796c] sparc64_realfault_common+0x10/0x20
 [0000000000480094] __lock_acquire+0x34/0xb00
 [0000000000481c04] lock_acquire+0x44/0x60
 [00000000006c25e4] _spin_lock+0x24/0x40
 [00000000004e5f94] remove_inode_buffers+0x34/0xc0
 [00000000004d863c] shrink_icache_memory+0x1fc/0x2e0
 [00000000004a29cc] shrink_slab+0x16c/0x200
 [00000000004a2df8] kswapd+0x398/0x5a0
 [000000000047381c] kthread+0x3c/0x80


So (in reverse order) with gdb we get some more information:

(gdb) l *shrink_icache_memory+0x1fc
0x4d863c is in shrink_icache_memory (/home/mako/linux/lkt/sources/linux-2.6/fs/inode.c:430).
425                             continue;
426                     }
427                     if (inode_has_buffers(inode) || inode->i_data.nrpages) {
428                             __iget(inode);
429                             spin_unlock(&inode_lock);
430                             if (remove_inode_buffers(inode))
431                                     reap += invalidate_mapping_pages(&inode->i_data,
432                                                                     0, -1);
433                             iput(inode);
434                             spin_lock(&inode_lock);
(gdb) l *remove_inode_buffers+0x34
0x4e5f94 is in remove_inode_buffers (/home/mako/linux/lkt/sources/linux-2.6/fs/buffer.c:898).
893             if (inode_has_buffers(inode)) {
894                     struct address_space *mapping = &inode->i_data;
895                     struct list_head *list = &mapping->private_list;
896                     struct address_space *buffer_mapping = mapping->assoc_mapping;
897     
898                     spin_lock(&buffer_mapping->private_lock);
899                     while (!list_empty(list)) {
900                             struct buffer_head *bh = BH_ENTRY(list->next);
901                             if (buffer_dirty(bh)) {
902                                     ret = 0;
(gdb) l *_spin_lock+0x24 
0x6c25e4 is in _spin_lock (/home/mako/linux/lkt/sources/linux-2.6/kernel/spinlock.c:180).
175     EXPORT_SYMBOL(_write_lock_bh);
176     
177     void __lockfunc _spin_lock(spinlock_t *lock)
178     {
179             preempt_disable();
180             spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
181             LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
182     }
183     
184     EXPORT_SYMBOL(_spin_lock);
(gdb) l *lock_acquire+0x44
0x481c04 is in lock_acquire (/home/mako/linux/lkt/sources/linux-2.6/kernel/lockdep.c:2941).
2936    
2937            raw_local_irq_save(flags);
2938            check_flags(flags);
2939    
2940            current->lockdep_recursion = 1;
2941            __lock_acquire(lock, subclass, trylock, read, check,
2942                           irqs_disabled_flags(flags), nest_lock, ip);
2943            current->lockdep_recursion = 0;
2944            raw_local_irq_restore(flags);
2945    }
(gdb) l *__lock_acquire+0x34
0x480094 is in __lock_acquire (/home/mako/linux/lkt/sources/linux-2.6/kernel/lockdep.c:2546).
2541                    printk("turning off the locking correctness validator.\n");
2542                    return 0;
2543            }
2544    
2545            if (!subclass)
2546                    class = lock->class_cache; <---- boom
2547            /*
2548             * Not cached yet or subclass?
2549             */
2550            if (unlikely(!class)) {



Hm... and there are processess hanging in uninteruptible sleep:
This would be the bash in which i run echo 3 > /proc/sys/vm/drop_caches

bash          D 00000000004d84a0     0 17499      1
Call Trace:
 [00000000006c0a70] mutex_lock_nested+0x110/0x320
 [00000000004d84a0] shrink_icache_memory+0x60/0x2e0
 [00000000004a29cc] shrink_slab+0x16c/0x200
 [00000000004e2a6c] drop_caches_sysctl_handler+0x4c/0x220
 [000000000050f998] proc_sys_call_handler+0x78/0xa0
 [000000000050f9d4] proc_sys_write+0x14/0x40
 [00000000004c4bec] vfs_write+0x6c/0x120
 [00000000004c506c] sys_write+0x2c/0x60
 [0000000000406254] linux_sparc_syscall32+0x34/0x40


And this one would be sparc-unknown-gnu-linux-gcc compiler running from emerge world.

sparc-unknown D 00000000004d84a0     0 16529      1
Call Trace:
 [00000000006c0a70] mutex_lock_nested+0x110/0x320
 [00000000004d84a0] shrink_icache_memory+0x60/0x2e0
 [00000000004a29cc] shrink_slab+0x16c/0x200
 [00000000004a3224] try_to_free_pages+0x224/0x360
 [000000000049ad54] __alloc_pages_internal+0x194/0x440
 [00000000004bfb9c] __slab_alloc+0x65c/0x720
 [00000000004bffbc] kmem_cache_alloc+0x9c/0xc0
 [00000000004449e8] tsb_grow+0x88/0x440
 [00000000004455dc] do_sparc64_fault+0x4bc/0x5c0
 [000000000040796c] sparc64_realfault_common+0x10/0x20


Both hang in the same place and it looks like fallout from the NULL pointer dereference,
the thread that oopsed took iprune_mutex held with it.

# cat /proc/17499/wchan 
shrink_icache_memory

# cat /proc/16529/wchan 
shrink_icache_memory


# cat /proc/17499/stat  
17499 (bash) D 1 17499 17499 0 -1 4194560 776 2910 0 7 5 294 22 97 20 0 1 0 1878410 3874816 267 18446744073709551615 65536 872296 4289715792 4289713784 4158566936 0 0 3293188 2072526587 5080224 0 0 20 0 0 0 0 0 0

# cat /proc/16529/stat  
16529 (sparc-unknown-l) D 1 21719 4519 34818 4519 4196608 701 0 0 0 2 10 0 0 20 0 1 0 1966196 5201920 385 18446744073709551615 65536 404580 4292198832 4292198184 1880286944 0 0 16777216 0 5080224 0 0 20 2 0 0 0 0 0

5080224 -> 0x4D84A0

and that falls somewhere within shrink_icache_memory()

# grep shrink_icache /proc/kallsyms -A1
00000000004d8440 t shrink_icache_memory
00000000004d8720 T inode_init_once

offset is 0x4D84A0 - 0x4d8440 = 0x60
and that points to:

(gdb) l *shrink_icache_memory+0x60
0x4d84a0 is in shrink_icache_memory (/home/mako/linux/lkt/sources/linux-2.6/fs/inode.c:413).
408             LIST_HEAD(freeable);
409             int nr_pruned = 0;
410             int nr_scanned;
411             unsigned long reap = 0;
412     
413             mutex_lock(&iprune_mutex); <---- here
414             spin_lock(&inode_lock);
415             for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) {
416                     struct inode *inode;
417     

I'm not an expert but it seems that the corruption happens to random memory areas
and thus the system dies in many different wonderful ways ;) Alhough this might
be just a coincidence.

Hope that helps,

	Mariusz
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux