Hi, On Wed, 2013-09-25 at 16:25 +0200, Pavel Herrmann wrote: > Hi > > I am trying to build a two-node cluster for samba, but I'm having some GFS2 > issues. > > The nodes themselves run as virtual machines in KVM (on different hosts), use > gentoo kernel 3.10.7 (not sure what exact version of vanilla it is based on), > and I use the cluster-next stack in somewhat minimal configuration (corosync-2 > with DLM-4, no pacemaker). > > while testing my cluster (using smbtorture), everything works fine, but the > moment I let users onto it, i get a kernel error that hangs the cluster > (fencing is set up and working, but doesnt kick in for some reason) > I suspect that this has been fixed, but without knowing exactly what version of the kernel this is and what patches have been applied to the kernel, I'm afraid that I'm a bit in the dark. I don't think we've seen anything like this recently relating to type 5 glocks, Steve. > this is what I get in kernel log: > > Sep 25 07:10:12 fs2 kernel: [18024.888481] GFS2: fsid=fs_clust:homes.1: quota exceeded for user 104202 > Sep 25 07:10:18 fs2 kernel: [18030.335727] GFS2: fsid=fs_clust:homes.1: quota exceeded for user 104202 > Sep 25 07:10:23 fs2 kernel: [18035.994476] original: gfs2_inode_lookup+0x128/0x240 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.994482] pid: 25317 > Sep 25 07:10:23 fs2 kernel: [18035.994484] lock type: 5 req lock state : 3 > Sep 25 07:10:23 fs2 kernel: [18035.994491] new: gfs2_inode_lookup+0x128/0x240 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.994493] pid: 25317 > Sep 25 07:10:23 fs2 kernel: [18035.994494] lock type: 5 req lock state : 3 > Sep 25 07:10:23 fs2 kernel: [18035.994498] G: s:SH n:5/168b15e f:Iqob t:SH d:EX/0 a:0 v:0 r:4 m:50 > Sep 25 07:10:23 fs2 kernel: [18035.994506] H: s:SH f:EH e:0 p:25317 [smbd] gfs2_inode_lookup+0x128/0x240 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.994549] general protection fault: 0000 [#1] SMP > Sep 25 07:10:23 fs2 kernel: [18035.994840] Modules linked in: iptable_filter ip_tables x_tables gfs2 dm_mod dlm sctp libcrc32c ipv6 configfs virtio_net i6300esb > Sep 25 07:10:23 fs2 kernel: [18035.995617] CPU: 2 PID: 25317 Comm: smbd Not tainted 3.10.7-gentoo #10 > Sep 25 07:10:23 fs2 kernel: [18035.995910] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > Sep 25 07:10:23 fs2 kernel: [18035.996191] task: ffff8800b2aa1b00 ti: ffff8800a4a02000 task.ti: ffff8800a4a02000 > Sep 25 07:10:23 fs2 kernel: [18035.996546] RIP: 0010:[<ffffffff81053bcb>] [<ffffffff81053bcb>] pid_task+0xb/0x40 > Sep 25 07:10:23 fs2 kernel: [18035.996999] RSP: 0018:ffff8800a4a03a10 EFLAGS: 00010206 > Sep 25 07:10:23 fs2 kernel: [18035.997253] RAX: 13270cbeaaf4957b RBX: ffff8800988f7710 RCX: 0000000000000006 > Sep 25 07:10:23 fs2 kernel: [18035.997592] RDX: 0000000000000007 RSI: 0000000000000000 RDI: 13270cbeaaf4957b > Sep 25 07:10:23 fs2 kernel: [18035.997934] RBP: ffff8800a4b43ba0 R08: 000000000000000a R09: 0000000000000000 > Sep 25 07:10:23 fs2 kernel: [18035.998019] R10: 0000000000000191 R11: 0000000000000190 R12: 0000000000000000 > Sep 25 07:10:23 fs2 kernel: [18035.998019] R13: ffff8800a4b43bf0 R14: ffffffffa0133720 R15: ffff8800995bd988 > Sep 25 07:10:23 fs2 kernel: [18035.998019] FS: 00007f1846316740(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000 > Sep 25 07:10:23 fs2 kernel: [18035.998019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 25 07:10:23 fs2 kernel: [18035.998019] CR2: 000000000122aae8 CR3: 000000009880c000 CR4: 00000000000007a0 > Sep 25 07:10:23 fs2 kernel: [18035.998019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 25 07:10:23 fs2 kernel: [18035.998019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Sep 25 07:10:23 fs2 kernel: [18035.998019] Stack: > Sep 25 07:10:23 fs2 kernel: [18035.998019] ffffffffa0111f07 ffff8800b2aa1e70 ffffffffa011ffd8 0000000000000000 > Sep 25 07:10:23 fs2 kernel: [18035.998019] 0000000000000000 0000000000000000 ffff880000000004 0000000000000032 > Sep 25 07:10:23 fs2 kernel: [18035.998019] ffff8800a4b43ba0 ffff8800a4b43bf0 00000000626f7149 ffff8800995bd988 > Sep 25 07:10:23 fs2 kernel: [18035.998019] Call Trace: > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa0111f07>] ? gfs2_dump_glock+0x1c7/0x360 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa011ffd8>] ? gfs2_inode_lookup+0x128/0x240 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81457b2b>] ? printk+0x4f/0x54 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81132e7d>] ? inode_init_always+0xed/0x1b0 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa01138bb>] ? gfs2_glock_nq+0x30b/0x3e0 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa011ffe0>] ? gfs2_inode_lookup+0x130/0x240 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa0109195>] ? gfs2_dirent_search+0xe5/0x1c0 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa010a4aa>] ? gfs2_dir_search+0x4a/0x80 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa01202f7>] ? gfs2_lookupi+0xf7/0x1f0 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa01203b9>] ? gfs2_lookupi+0x1b9/0x1f0 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa0121821>] ? gfs2_lookup+0x21/0xa0 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff811315e6>] ? d_alloc+0x76/0x90 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81124be3>] ? lookup_dcache+0xa3/0xd0 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff811246c4>] ? lookup_real+0x14/0x50 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81124c42>] ? __lookup_hash+0x32/0x50 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81459d64>] ? lookup_slow+0x3c/0xa2 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81126edf>] ? path_lookupat+0x23f/0x780 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa011f169>] ? gfs2_getxattr+0x79/0xa0 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8112744f>] ? filename_lookup+0x2f/0xc0 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81125ccc>] ? getname_flags+0xbc/0x1a0 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8112a32c>] ? user_path_at_empty+0x5c/0xb0 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa01122c6>] ? gfs2_holder_uninit+0x16/0x30 [gfs2] > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8111f8fd>] ? cp_new_stat+0x10d/0x120 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8111facf>] ? vfs_fstatat+0x3f/0x90 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8111fc02>] ? SYSC_newstat+0x12/0x30 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8105ed51>] ? lg_local_lock+0x11/0x20 > Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8145ff69>] ? system_call_fastpath+0x16/0x1b > Sep 25 07:10:23 fs2 kernel: [18035.998019] Code: 31 f6 48 85 c0 74 0c 8b 50 04 48 c1 e2 05 48 8b 74 10 38 e9 28 ff ff ff 0f 1f 84 00 00 00 00 00 48 85 ff 74 23 89 f6 48 8d 04 f7 <48> 8b 40 08 48 85 c0 74 1c 48 8d 14 76 48 8d 14 d5 30 02 00 00 > Sep 25 07:10:23 fs2 kernel: [18035.998019] RIP [<ffffffff81053bcb>] pid_task+0xb/0x40 > Sep 25 07:10:23 fs2 kernel: [18035.998019] RSP <ffff8800a4a03a10> > Sep 25 07:10:23 fs2 kernel: [18036.033702] ---[ end trace e5751bbc7d3a8d7c ]--- > > > simple inspecfion of the gfs2 code showed this is caused by attempting a > recursive lock. two gfs2_inode_lookups are visible in the trace, not sure > that is strictly relevant though. > > this is followed by (probaby related) trace: > > > Sep 25 07:10:24 fs2 kernel: [18036.162513] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070 > Sep 25 07:10:24 fs2 kernel: [18036.164016] IP: [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2] > Sep 25 07:10:24 fs2 kernel: [18036.164016] PGD 989a3067 PUD 9886a067 PMD 0 > Sep 25 07:10:24 fs2 kernel: [18036.164016] Oops: 0000 [#2] SMP > Sep 25 07:10:24 fs2 kernel: [18036.164016] Modules linked in: iptable_filter ip_tables x_tables gfs2 dm_mod dlm sctp libcrc32c ipv6 configfs virtio_net i6300esb > Sep 25 07:10:24 fs2 kernel: [18036.164016] CPU: 1 PID: 25453 Comm: smbd Tainted: G D 3.10.7-gentoo #10 > Sep 25 07:10:24 fs2 kernel: [18036.164016] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > Sep 25 07:10:24 fs2 kernel: [18036.164016] task: ffff8800afca0d80 ti: ffff8800a4a02000 task.ti: ffff8800a4a02000 > Sep 25 07:10:24 fs2 kernel: [18036.164016] RIP: 0010:[<ffffffffa011f7c6>] [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2] > Sep 25 07:10:24 fs2 kernel: [18036.164016] RSP: 0018:ffff8800a4a03c08 EFLAGS: 00010286 > Sep 25 07:10:24 fs2 kernel: [18036.164016] RAX: ffffffff8145f245 RBX: 0000000000000040 RCX: 0000000000000000 > Sep 25 07:10:24 fs2 kernel: [18036.164016] RDX: ffff8800b5668f00 RSI: 0000000000000001 RDI: ffff8800a4b97ddc > Sep 25 07:10:24 fs2 kernel: [18036.164016] RBP: ffff880099486e60 R08: 0000000000000061 R09: 0000000000000000 > Sep 25 07:10:24 fs2 kernel: [18036.164016] R10: ff48ad3954b34002 R11: d09e94939e979e85 R12: ffff8800a4b97ddc > Sep 25 07:10:24 fs2 kernel: [18036.164016] R13: 0000000000000001 R14: ffff8800a4b97df8 R15: ffff8800afca0d80 > Sep 25 07:10:24 fs2 kernel: [18036.164016] FS: 00007f1846316740(0000) GS:ffff8800bfa80000(0000) knlGS:0000000000000000 > Sep 25 07:10:24 fs2 kernel: [18036.164016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 25 07:10:24 fs2 kernel: [18036.164016] CR2: 0000000000000070 CR3: 000000009880c000 CR4: 00000000000007a0 > Sep 25 07:10:24 fs2 kernel: [18036.164016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 25 07:10:24 fs2 kernel: [18036.164016] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Sep 25 07:10:24 fs2 kernel: [18036.164016] Stack: > Sep 25 07:10:24 fs2 kernel: [18036.164016] ffff8800994e0c00 ffffffff81125a8b ffff8800a4a03c18 ffff8800a4a03c18 > Sep 25 07:10:24 fs2 kernel: [18036.164016] 0000000000000000 ffff8800bbba8d20 0000000800000003 0000000200000000 > Sep 25 07:10:24 fs2 kernel: [18036.164016] ffffffff8145f245 ffffffff8112ff5e ffff8800a4a03e08 0000000000000007 > Sep 25 07:10:24 fs2 kernel: [18036.164016] Call Trace: > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff81125a8b>] ? lookup_fast+0x1ab/0x2f0 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8112ff5e>] ? dput+0x17e/0x220 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8112610a>] ? link_path_walk+0x23a/0x8b0 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff81126b9c>] ? path_init+0x30c/0x410 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff81126cf2>] ? path_lookupat+0x52/0x780 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8112744f>] ? filename_lookup+0x2f/0xc0 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff81125ccc>] ? getname_flags+0xbc/0x1a0 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8112a32c>] ? user_path_at_empty+0x5c/0xb0 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8111facf>] ? vfs_fstatat+0x3f/0x90 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8111fc02>] ? SYSC_newstat+0x12/0x30 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8111b420>] ? SyS_read+0x50/0xa0 > Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8145ff69>] ? system_call_fastpath+0x16/0x1b > Sep 25 07:10:24 fs2 kernel: [18036.164016] Code: c6 50 65 48 8b 04 25 80 b7 00 00 48 8b 90 40 02 00 00 4c 39 f3 75 14 eb 1a 0f 1f 40 00 48 3b 53 18 74 12 48 8b 1b 49 39 de 74 08 <48> 8b 43 30 a8 40 75 ea 31 db 4c 89 e7 e8 e8 78 f0 e0 66 90 45 > Sep 25 07:10:24 fs2 kernel: [18036.164016] RIP [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2] > Sep 25 07:10:24 fs2 kernel: [18036.164016] RSP <ffff8800a4a03c08> > Sep 25 07:10:24 fs2 kernel: [18036.164016] CR2: 0000000000000070 > Sep 25 07:10:24 fs2 kernel: [18036.218133] ---[ end trace e5751bbc7d3a8d7d ]--- > > afterwards the log is filled with "INFO: rcu_sched self-detected stall" and > NMI-caused backtraces > > Is this a known-and-fixed bug? is there a way to prevent this? > > > thanks > Pavel Herrmann > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster