Hi all,
We're seeing gfs2 crashes since we've upgraded to RHEL 6.4. The
traceback is:
[2013-03-13 08:48:24]BUG: unable to handle kernel NULL pointer
dereference at 0000000000000060^M
[2013-03-13 08:48:24]IP: [<ffffffffa04d66ef>]
gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]^M
[2013-03-13 08:48:24]PGD 0 ^M
[2013-03-13 08:48:24]Oops: 0002 [#1] SMP ^M
[2013-03-13 08:48:24]last sysfs file:
/sys/devices/pci0000:00/0000:00:06.0/0000:0b:00.0/0000:0c:09.0/0000:0d:00.1/host3/rport-3:0-4/target3:0:3/3:0:3:14/state^M
[2013-03-13 08:48:24]CPU 0 ^M
[2013-03-13 08:48:24]Modules linked in: autofs4 gfs2 dlm configfs sunrpc
p4_clockmod freq_table speedstep_lib arpt_mangle arptable_filter
arp_tables ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_recent ipt_LOG
iptable_filter ip_tables nf_conntrack_netbios_ns nf_conntrack_broadcast
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables ipv6 uinput hpwdt hpilo microcode iTCO_wdt
iTCO_vendor_support i7300_edac edac_core bnx2 sg shpchp ext4 mbcache
jbd2 dm_round_robin sd_mod crc_t10dif sr_mod cdrom qla2xxx
scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix hpsa cciss
radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath
dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mperf]^M
[2013-03-13 08:48:24]^M
[2013-03-13 08:48:24]Pid: 9888, comm: smbd Not tainted
2.6.32-358.0.1.el6.x86_64 #1 HP ProLiant DL580 G5^M
[2013-03-13 08:48:24]RIP: 0010:[<ffffffffa04d66ef>] [<ffffffffa04d66ef>]
gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]^M
[2013-03-13 08:48:24]RSP: 0018:ffff880dce0f9c98 EFLAGS: 00010287^M
[2013-03-13 08:48:24]RAX: ffff880ff78999a8 RBX: ffff880dae61d7c0 RCX:
00000000006c0762^M
[2013-03-13 08:48:24]RDX: 00000000006c0762 RSI: 00000000006c075b RDI:
ffff88100b2b6440^M
[2013-03-13 08:48:24]RBP: ffff880dce0f9d58 R08: 1050000000000000 R09:
f213f3d57bbf820a^M
[2013-03-13 08:48:24]R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000001000^M
[2013-03-13 08:48:24]R13: 0000000000000000 R14: 0000000000000001 R15:
0000000000000000^M
[2013-03-13 08:48:24]FS: 00007f3ac254c7c0(0000)
GS:ffff880061a00000(0000) knlGS:0000000000000000^M
[2013-03-13 08:48:24]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[2013-03-13 08:48:24]CR2: 0000000000000060 CR3: 0000000dce153000 CR4:
00000000000007f0^M
[2013-03-13 08:48:24]DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000^M
[2013-03-13 08:48:24]DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400^M
[2013-03-13 08:48:24]Process smbd (pid: 9888, threadinfo
ffff880dce0f8000, task ffff88100b0b6ae0)^M
[2013-03-13 08:48:24]Stack:^M
[2013-03-13 08:48:24] ffff880dce0f9e08 000000000000000a ffff880dce0f9cc8
ffffffff81096c8f^M
[2013-03-13 08:48:24]<d> ffff880dce0f9dd8 00000007b078eaf8
ffff880dce0f9cd8 ffff88100b2b6000^M
[2013-03-13 08:48:24]<d> ffff880dce0f9d28 ffffffffa04be2a8
ffff880ff78999a8 0000000000000000^M
[2013-03-13 08:48:24]Call Trace:^M
[2013-03-13 08:48:24] [<ffffffff81096c8f>] ? wake_up_bit+0x2f/0x40^M
[2013-03-13 08:48:24] [<ffffffffa04be2a8>] ? do_promote+0x208/0x330 [gfs2]^M
[2013-03-13 08:48:24] [<ffffffffa04b106e>] gfs2_setattr_size+0xce/0x210
[gfs2]^M
[2013-03-13 08:48:24] [<ffffffffa04cd534>] gfs2_setattr+0x214/0x330 [gfs2]^M
[2013-03-13 08:48:24] [<ffffffffa04cd366>] ? gfs2_setattr+0x46/0x330
[gfs2]^M
[2013-03-13 08:48:24] [<ffffffff8119e768>] notify_change+0x168/0x340^M
[2013-03-13 08:48:24] [<ffffffff8117f1e4>] do_truncate+0x64/0xa0^M
[2013-03-13 08:48:24] [<ffffffff8117f520>] sys_ftruncate+0x120/0x130^M
[2013-03-13 08:48:24] [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b^M
[2013-03-13 08:48:24]Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 8b 4d a0
48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 75 d6 ff ff
48 89 45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 e9 01 fb ff
ff 48 ^M
[2013-03-13 08:48:24]RIP [<ffffffffa04d66ef>]
gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]^M
[2013-03-13 08:48:24] RSP <ffff880dce0f9c98>^M
[2013-03-13 08:48:24]CR2: 0000000000000060^M
We've seen this from both svn and smbd now, and on a couple of different
nodes in our cluster. We brought the cluster down last night and ran
gfs2.fsck on all filesystems, but the problem persists.
Has anyone seen this before? Is there a workaround or should we drop
back to the previous kernel?
-- scooter
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster