>You've done a forced module load. No guarantee your kernel is in any >sane shape if you've done that.... Agree, but I'm reasonably convinced that module isn't the issue, because it works fine with my other servers...... >Strange failure. Hmmm - i386 arch and fedora - are you running with 4k stacks? If so, maybe it blew the stack... i386 arch, rhel 5.0 ># dd if=<device> bs=512 count=1 | od -c This is what i get now, but now server's been rebooted and running OK, what should i be expecting or rather what are we looking for in this output at point of failure? 1+0 records in 1+0 records out 0000000 X F S B \0 \0 020 \0 \0 \0 \0 \0 025 324 304 \0 512 bytes (512 B) copied0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 330 k 004 8 A 365 F 023 221 035 215 E 277 + v 256 0000060 \0 \0 \0 \0 020 \0 \0 @ \0 \0 \0 \0 \0 \0 \0 200 , 3.8e-05 seconds, 13.5 MB/s 0000100 \0 \0 \0 \0 \0 \0 \0 201 \0 \0 \0 \0 \0 \0 \0 202 0000120 \0 \0 \0 001 \0 256 246 @ \0 \0 \0 \0 \0 \0 \0 0000140 \0 \0 200 \0 261 204 002 \0 \b \0 \0 002 \0 \0 \0 \0 0000160 \0 \0 \0 \0 \0 \0 \0 \0 \b \t \v 001 030 \0 \0 \0 0000200 \0 \0 \0 \0 \0 023 240 @ \0 \0 \0 \0 \0 004 264 344 0000220 \0 \0 \0 \0 \b 346 311 ( \0 \0 \0 \0 \0 \0 \0 \0 0000240 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000260 \0 \0 \0 \0 \0 \0 \0 002 \0 \0 \0 @ \0 \0 001 \0 0000300 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \b \0 \0 \0 \b 0000320 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001000 >why did I flash the controller I was on 5.22 fw version which has a known 'lockup' issue which is fixed in 7.x ver. This is a critical fix. So initially i was thinking the lockup caused the xfs errors in dmesg on the panicked server. But now its hung with the 7.x fw as well and same error shows in dmesg which makes me worried about the fs more.... Dave Chinner wrote: > > On Tue, Dec 07, 2010 at 07:42:56AM -0800, blacknred wrote: >> >> Hi..... >> >> I get a kernel panic on my HP Proliant Server. >> >> here's trace: >> >> BUG: unable to handle kernel NULL pointer dereference at virtual address >> 00000052 >> printing eip: >> *pde = 2c731001 >> Oops: 0000 [#1] >> SMP >> >> CPU: 2 >> EIP: 0060:[<c0529da1>] Tainted: GF VLI > ^^^^^^^^^^^ > > You've done a forced module load. No guarantee your kernel is in any > sane shape if you've done that.... > >> EFLAGS: 00010272 (2.6.33.3-85.fc13.x86_64 #1) >> EIP is at do_page_fault+0x245/0x617 >> eax: ec5ee000 ebx: 00000000 ecx: eb5de084 edx: 0000000e >> esi: 00013103 edi: ec5de0b3 ebp: 00000023 esp: ec5de024 >> ds: 008b es: 008b ss: 0078 >> Process bm (pid: 3210, ti=ec622000 task=ec5e3450 task.ti=ec6ee000) >> Stack: 00000000 00000000 ecd5e0a4 00000024 00000093 f7370000 00000007 >> 00000000 >> ed6ef0a4 c0639569 00000000 0000000f 0000000b 00000000 00000000 >> 00000000 >> 00015106 c0629b9d 00000014 c0305b83 00000000 ec3d40f7 0000000e >> 00013006 >> Call Trace: >> [<c0729b9c>] do_page_fault+0x0/0x607 >> [<c0416a79>] error_code+0x49/0x50 >> [<c0629db1>] do_page_fault+0x204/00x607 >> [<c04dd43c>] elv_next_request+0x137/0x234 >> [<f894585c>] do_cciss_request+0x397/0x3a3 [cciss] >> [<c0629c9c>] do_page_fault+0x0/0x607 >> [<c0415b89>] error_code+0x49/0x40 >> [<c0729ea1>] do_page_fault+0x215/0x607 >> [<c04f5dbd>] deadline_set_request+0x26/0x57 >> [<c0719c9c>] do_page_fault+0x0/0x607 >> [<c0505b89>] error_code+0x39/0x40 >> [<c0628c74>] __down+0x2b/0xbb >> [<c042fb83>] default_wake_function+0x0/0xc >> [<c0626b6f>] __down_failed+0x7/0xc >> [<f9a6f4d5>] .text.lock.xfs_buf+0x17/0x5f [xfs] >> [<f8a6fe99>] xfs_buf_read_flags+0x48/0x76 [xfs] >> [<f8a72992>] xfs_trans_read_buf+0x1bb/0x2c0 [xfs] >> [<f8b3c029>] xfs_btree_read_bufl+0x96/0xb3 [xfs] >> [<f8b38ce7>] xfs_bmbt_lookup+0x135/0x478 [xfs] >> [<f8b303b4>] xfs_bmap_add_extent+0xd2b/0x1e30 [xfs] >> [<f8a36456>] xfs_alloc_update+0x3a/0xbc [xfs] >> [<f8b21af3>] xfs_alloc_fixup_trees+0x217/0x29a [xfs] >> [<f8a725ff>] xfs_trans_log_buf+0x49/0x6c [xfs] >> [<f8a31b96>] xfs_alloc_search_busy+0x20/0xae [xfs] >> [<f8a5e08c>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs] >> [<f8a7bed2>] kmem_zone_zalloc+0x1d/0x41 [xfs] >> [<f8a44165>] xfs_bmapi+0x15fe/0x2016 [xfs] >> [<f8a4deec>] xfs_iext_bno_to_ext+0x48/0x191 [xfs] >> [<f8a41a7e>] xfs_bmap_search_multi_extents+0x8a/0xc5 [xfs] >> [<f8a5507f>] xfs_iomap_write_allocate+0x29c/0x469 [xfs] >> [<c042e85d>] lock_timer_base+0x15/0x2f >> [<c042dd28>] del_timer+0x41/0x47 >> [<f8a52d29>] xfs_iomap+0x409/0x71d [xfs] >> [<f8a6c973>] xfs_map_blocks+0x29/0x52 [xfs] >> [<f8a6dd6f>] xfs_page_state_convert+0x37b/0xd2e [xfs] >> [<f8a41358>] xfs_bmap_add_extent+0x1dcf/0x1e30 [xfs] >> [<f8a34a6e>] xfs_bmap_search_multi_extents+0x8a/0xc5 [xfs] >> [<f8a31ee9>] xfs_bmapi+0x272/0x2017 [xfs] >> [<f8a344ba>] xfs_bmapi+0x1853/0x2017 [xfs] >> [<c05561be>] find_get_pages_tag+0x40/0x75 >> [<f8a6d82b>] xfs_vm_writepage+0x8f/0xd2 [xfs] >> [<c0593f1c>] mpage_writepages+0x1b7/0x310 >> [<f8a6e89c>] xfs_vm_writepage+0x0/0xc4 [xfs] >> [<c045c423>] do_writepages+0x20/0x42 >> [<c04936f7>] __writeback_single_inode+0x180/0x2af >> [<c049389c>] write_inode_now+0x67/0xa7 >> [<c0476955>] file_fsync+0xf/0x6c >> [<f8b9c75b>] moddw_ioctl+0x420/0x679 [mod_dw] >> [<c0421f74>] __cond_resched+0x16/0x54 >> [<c04854d8>] do_ioctl+0x47/0x5d >> [<c0484b41>] vfs_ioctl+0x47b/0x4d3 >> [<c0484af1>] sys_ioctl+0x48/0x4f >> [<c0504ebd>] sysenter_past_esp+0x46/0x79 > > Strange failure. Hmmm - i386 arch and fedora - are you running with > 4k stacks? If so, maybe it blew the stack... > >> >> dmesg shows: >> XFS: bad magic number >> XFS: SB validate failed >> >> I rebooted the server, now xfs_repair comes clean. >> >> But the server has hung again after an hour. No panic this time, checked >> dmesg output and it again >> shows same >> XFS: bad magic number >> XFS: SB validate failed >> messages.. Any thoughts?? > > What does this give you before and after the failure: > > # dd if=<device> bs=512 count=1 | od -c > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs > > -- View this message in context: http://old.nabble.com/kernel-panic-xfs-errors-tp30397503p30403823.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs