Hi All,
A few weeks ago I discovered that I'd had an obsolete gfs2 kernel module
loaded and removed it, thus bringing it up to the revision included in
the current kernel. Was hoping that all was well, but then yesterday
morning one of the nodes panicked as follows:
original: gfs2_rename+0x19d/0x63b [gfs2]
pid : 12810
lock type: 3 req lock state : 1
new: gfs2_rlist_alloc+0x5c/0x6a [gfs2]
pid: 12810
lock type: 3 req lock state : 1
G: s:EX n:3/33d0327 f:y t:EX d:EX/0 l:0 a:5 r:4
H: s:EX f:H e:0 p:12810 [imap] gfs2_rename+0x19d/0x63b [gfs2]
R: n:54330151 f:05 b:274/274 i:1121
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/gfs2/glock.c:1074
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:0a.0/0000:02:02.0/irq
CPU 1
Modules linked in: nfs fscache nfs_acl lock_dlm gfs2 dlm configfs lockd
sunrpc ipv6 xfrm_nalgo crypto_api ipt_LOG xt_state ip_conntrack
nfnetlink xt_tcpudp iptable_filter ip_tables x_tables 8021q dm_multipath
scsi_dh video backlight sbs i2c_ec button battery asus_acpi
acpi_memhotplug ac parport_pc lp parport i2c_amd756 k8temp ide_cd
i2c_core hwmon sg amd_rng cdrom k8_edac pcspkr tg3 floppy edac_mc e1000
dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero
dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptspi mptscsih
mptbase scsi_transport_spi sd_mod scsi_mod raid1 ext3 jbd uhci_hcd
ohci_hcd ehci_hcd
Pid: 12810, comm: imap Not tainted 2.6.18-164.6.1.el5 #1
RIP: 0010:[<ffffffff8862a6df>] [<ffffffff8862a6df>]
:gfs2:gfs2_glock_nq+0x231/0x273
RSP: 0018:ffff8101ba8d9868 EFLAGS: 00010292
RAX: 0000000000000000 RBX: ffff8101ba8d9cb0 RCX: 0000000000000461
RDX: ffff8101ffe27a98 RSI: ffffffff80309c28 RDI: ffffffff80309c20
RBP: ffff8101860b1340 R08: ffffffff80309c28 R09: 000000000000003f
R10: ffff8101ba8d9368 R11: 0000000000000000 R12: ffff8100e87ea590
R13: ffff8100e87ea590 R14: ffff8100ed24e000 R15: 0000000000000000
FS: 00002b18a78ac530(0000) GS:ffff810103901940(0000) knlGS:00000000acbfbb90
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b70cf5cf000 CR3: 00000001b4d4a000 CR4: 00000000000006e0
Process imap (pid: 12810, threadinfo ffff8101ba8d8000, task
ffff8101ffe277e0)
Stack: ffff8101860b1340 0000000000000001 ffff8100b3e1b000 ffff8100b3e1a0e8
0000000000000000 ffffffff8862a74e 0000000000000038 ffff810184e88368
0000000000000001 ffffffff800caa0b 0000000000000005 ffff810184e88368
Call Trace:
[<ffffffff8862a74e>] :gfs2:gfs2_glock_nq_m+0x2d/0xf4
[<ffffffff800caa0b>] __kzalloc+0x9/0x21
[<ffffffff88622831>] :gfs2:do_strip+0x175/0x349
[<ffffffff886217e2>] :gfs2:recursive_scan+0xf2/0x175
[<ffffffff886218fe>] :gfs2:trunc_dealloc+0x99/0xe7
[<ffffffff886226bc>] :gfs2:do_strip+0x0/0x349
[<ffffffff80090000>] sched_exit+0xb4/0xb5
[<ffffffff88638dda>] :gfs2:gfs2_delete_inode+0xdd/0x191
[<ffffffff88638d43>] :gfs2:gfs2_delete_inode+0x46/0x191
[<ffffffff88628e77>] :gfs2:gfs2_glock_schedule_for_reclaim+0x5d/0x9a
[<ffffffff88638cfd>] :gfs2:gfs2_delete_inode+0x0/0x191
[<ffffffff8002f48f>] generic_delete_inode+0xc6/0x143
[<ffffffff8863d9a4>] :gfs2:gfs2_inplace_reserve_i+0x63b/0x691
[<ffffffff886248c4>] :gfs2:gfs2_dirent_find_space+0x0/0x41
[<ffffffff88623983>] :gfs2:gfs2_dirent_search+0x147/0x16e
[<ffffffff886377c5>] :gfs2:gfs2_rename+0x3be/0x63b
[<ffffffff88637506>] :gfs2:gfs2_rename+0xff/0x63b
[<ffffffff8863754c>] :gfs2:gfs2_rename+0x145/0x63b
[<ffffffff88637571>] :gfs2:gfs2_rename+0x16a/0x63b
[<ffffffff886375a4>] :gfs2:gfs2_rename+0x19d/0x63b
[<ffffffff88629e29>] :gfs2:gfs2_holder_uninit+0xd/0x1f
[<ffffffff886385bf>] :gfs2:gfs2_permission+0xaf/0xd4
[<ffffffff88633124>] :gfs2:gfs2_drevalidate+0x158/0x214
[<ffffffff8000d902>] permission+0x81/0xc8
[<ffffffff8002a7d9>] vfs_rename+0x2f4/0x471
[<ffffffff80036c20>] sys_renameat+0x180/0x1eb
[<ffffffff800b66f5>] audit_syscall_entry+0x180/0x1b3
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
Code: 0f 0b 68 f8 27 64 88 c2 32 04 be 01 00 00 00 4c 89 ef e8 df
RIP [<ffffffff8862a6df>] :gfs2:gfs2_glock_nq+0x231/0x273
RSP <ffff8101ba8d9868>
<0>Kernel panic - not syncing: Fatal exception
Killed by signal 15.
It seems possible that there would be some filesystem damage from
running the old code and I'm going to fsck this weekend, but wanted to
post this in case it revealed an obvious problem to anyone. The
"invalid opcode: 0000" makes me think we ended up executing code that
was actually data, but beyond that I'm clueless.
Thanks,
Allen
--
Allen Belletti
allen@xxxxxxxxxxxxxxx 404-894-6221 Phone
Industrial and Systems Engineering 404-385-2988 Fax
Georgia Institute of Technology
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster