GFS2 crashes after upgrade RHEL 6.4

Scooter Morris <scooter@xxxxxxxxxxxx> · Wed, 13 Mar 2013 10:09:38 -0700

Hi all,
    We're seeing gfs2 crashes since we've upgraded to RHEL 6.4.  The 
traceback is:

[2013-03-13 08:48:24]BUG: unable to handle kernel NULL pointer 
dereference at 0000000000000060^M
[2013-03-13 08:48:24]IP: [<ffffffffa04d66ef>] 
gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]^M
[2013-03-13 08:48:24]PGD 0 ^M
[2013-03-13 08:48:24]Oops: 0002 [#1] SMP ^M
[2013-03-13 08:48:24]last sysfs file: 
/sys/devices/pci0000:00/0000:00:06.0/0000:0b:00.0/0000:0c:09.0/0000:0d:00.1/host3/rport-3:0-4/target3:0:3/3:0:3:14/state^M
[2013-03-13 08:48:24]CPU 0 ^M
[2013-03-13 08:48:24]Modules linked in: autofs4 gfs2 dlm configfs sunrpc 
p4_clockmod freq_table speedstep_lib arpt_mangle arptable_filter 
arp_tables ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_recent ipt_LOG 
iptable_filter ip_tables nf_conntrack_netbios_ns nf_conntrack_broadcast 
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack 
ip6table_filter ip6_tables ipv6 uinput hpwdt hpilo microcode iTCO_wdt 
iTCO_vendor_support i7300_edac edac_core bnx2 sg shpchp ext4 mbcache 
jbd2 dm_round_robin sd_mod crc_t10dif sr_mod cdrom qla2xxx 
scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix hpsa cciss 
radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath 
dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mperf]^M
[2013-03-13 08:48:24]^M
[2013-03-13 08:48:24]Pid: 9888, comm: smbd Not tainted 
2.6.32-358.0.1.el6.x86_64 #1 HP ProLiant DL580 G5^M
[2013-03-13 08:48:24]RIP: 0010:[<ffffffffa04d66ef>] [<ffffffffa04d66ef>] 
gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]^M
[2013-03-13 08:48:24]RSP: 0018:ffff880dce0f9c98  EFLAGS: 00010287^M
[2013-03-13 08:48:24]RAX: ffff880ff78999a8 RBX: ffff880dae61d7c0 RCX: 
00000000006c0762^M
[2013-03-13 08:48:24]RDX: 00000000006c0762 RSI: 00000000006c075b RDI: 
ffff88100b2b6440^M
[2013-03-13 08:48:24]RBP: ffff880dce0f9d58 R08: 1050000000000000 R09: 
f213f3d57bbf820a^M
[2013-03-13 08:48:24]R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000000001000^M
[2013-03-13 08:48:24]R13: 0000000000000000 R14: 0000000000000001 R15: 
0000000000000000^M
[2013-03-13 08:48:24]FS:  00007f3ac254c7c0(0000) 
GS:ffff880061a00000(0000) knlGS:0000000000000000^M
[2013-03-13 08:48:24]CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[2013-03-13 08:48:24]CR2: 0000000000000060 CR3: 0000000dce153000 CR4: 
00000000000007f0^M
[2013-03-13 08:48:24]DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000^M
[2013-03-13 08:48:24]DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400^M
[2013-03-13 08:48:24]Process smbd (pid: 9888, threadinfo 
ffff880dce0f8000, task ffff88100b0b6ae0)^M
[2013-03-13 08:48:24]Stack:^M
[2013-03-13 08:48:24] ffff880dce0f9e08 000000000000000a ffff880dce0f9cc8 
ffffffff81096c8f^M
[2013-03-13 08:48:24]<d> ffff880dce0f9dd8 00000007b078eaf8 
ffff880dce0f9cd8 ffff88100b2b6000^M
[2013-03-13 08:48:24]<d> ffff880dce0f9d28 ffffffffa04be2a8 
ffff880ff78999a8 0000000000000000^M
[2013-03-13 08:48:24]Call Trace:^M
[2013-03-13 08:48:24] [<ffffffff81096c8f>] ? wake_up_bit+0x2f/0x40^M
[2013-03-13 08:48:24] [<ffffffffa04be2a8>] ? do_promote+0x208/0x330 [gfs2]^M
[2013-03-13 08:48:24] [<ffffffffa04b106e>] gfs2_setattr_size+0xce/0x210 
[gfs2]^M
[2013-03-13 08:48:24] [<ffffffffa04cd534>] gfs2_setattr+0x214/0x330 [gfs2]^M
[2013-03-13 08:48:24] [<ffffffffa04cd366>] ? gfs2_setattr+0x46/0x330 
[gfs2]^M
[2013-03-13 08:48:24] [<ffffffff8119e768>] notify_change+0x168/0x340^M
[2013-03-13 08:48:24] [<ffffffff8117f1e4>] do_truncate+0x64/0xa0^M
[2013-03-13 08:48:24] [<ffffffff8117f520>] sys_ftruncate+0x120/0x130^M
[2013-03-13 08:48:24] [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b^M
[2013-03-13 08:48:24]Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 8b 4d a0 
48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 75 d6 ff ff 
48 89 45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 e9 01 fb ff 
ff 48 ^M
[2013-03-13 08:48:24]RIP  [<ffffffffa04d66ef>] 
gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]^M
[2013-03-13 08:48:24] RSP <ffff880dce0f9c98>^M
[2013-03-13 08:48:24]CR2: 0000000000000060^M

We've seen this from both svn and smbd now, and on a couple of different 
nodes in our cluster.   We brought the cluster down last night and ran 
gfs2.fsck on all filesystems, but the problem persists.

Has anyone seen this before?  Is there a workaround or should we drop 
back to the previous kernel?

-- scooter

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster