Hi I am trying to build a two-node cluster for samba, but I'm having some GFS2 issues. The nodes themselves run as virtual machines in KVM (on different hosts), use gentoo kernel 3.10.7 (not sure what exact version of vanilla it is based on), and I use the cluster-next stack in somewhat minimal configuration (corosync-2 with DLM-4, no pacemaker). while testing my cluster (using smbtorture), everything works fine, but the moment I let users onto it, i get a kernel error that hangs the cluster (fencing is set up and working, but doesnt kick in for some reason) this is what I get in kernel log: Sep 25 07:10:12 fs2 kernel: [18024.888481] GFS2: fsid=fs_clust:homes.1: quota exceeded for user 104202 Sep 25 07:10:18 fs2 kernel: [18030.335727] GFS2: fsid=fs_clust:homes.1: quota exceeded for user 104202 Sep 25 07:10:23 fs2 kernel: [18035.994476] original: gfs2_inode_lookup+0x128/0x240 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.994482] pid: 25317 Sep 25 07:10:23 fs2 kernel: [18035.994484] lock type: 5 req lock state : 3 Sep 25 07:10:23 fs2 kernel: [18035.994491] new: gfs2_inode_lookup+0x128/0x240 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.994493] pid: 25317 Sep 25 07:10:23 fs2 kernel: [18035.994494] lock type: 5 req lock state : 3 Sep 25 07:10:23 fs2 kernel: [18035.994498] G: s:SH n:5/168b15e f:Iqob t:SH d:EX/0 a:0 v:0 r:4 m:50 Sep 25 07:10:23 fs2 kernel: [18035.994506] H: s:SH f:EH e:0 p:25317 [smbd] gfs2_inode_lookup+0x128/0x240 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.994549] general protection fault: 0000 [#1] SMP Sep 25 07:10:23 fs2 kernel: [18035.994840] Modules linked in: iptable_filter ip_tables x_tables gfs2 dm_mod dlm sctp libcrc32c ipv6 configfs virtio_net i6300esb Sep 25 07:10:23 fs2 kernel: [18035.995617] CPU: 2 PID: 25317 Comm: smbd Not tainted 3.10.7-gentoo #10 Sep 25 07:10:23 fs2 kernel: [18035.995910] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Sep 25 07:10:23 fs2 kernel: [18035.996191] task: ffff8800b2aa1b00 ti: ffff8800a4a02000 task.ti: ffff8800a4a02000 Sep 25 07:10:23 fs2 kernel: [18035.996546] RIP: 0010:[<ffffffff81053bcb>] [<ffffffff81053bcb>] pid_task+0xb/0x40 Sep 25 07:10:23 fs2 kernel: [18035.996999] RSP: 0018:ffff8800a4a03a10 EFLAGS: 00010206 Sep 25 07:10:23 fs2 kernel: [18035.997253] RAX: 13270cbeaaf4957b RBX: ffff8800988f7710 RCX: 0000000000000006 Sep 25 07:10:23 fs2 kernel: [18035.997592] RDX: 0000000000000007 RSI: 0000000000000000 RDI: 13270cbeaaf4957b Sep 25 07:10:23 fs2 kernel: [18035.997934] RBP: ffff8800a4b43ba0 R08: 000000000000000a R09: 0000000000000000 Sep 25 07:10:23 fs2 kernel: [18035.998019] R10: 0000000000000191 R11: 0000000000000190 R12: 0000000000000000 Sep 25 07:10:23 fs2 kernel: [18035.998019] R13: ffff8800a4b43bf0 R14: ffffffffa0133720 R15: ffff8800995bd988 Sep 25 07:10:23 fs2 kernel: [18035.998019] FS: 00007f1846316740(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000 Sep 25 07:10:23 fs2 kernel: [18035.998019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 25 07:10:23 fs2 kernel: [18035.998019] CR2: 000000000122aae8 CR3: 000000009880c000 CR4: 00000000000007a0 Sep 25 07:10:23 fs2 kernel: [18035.998019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 25 07:10:23 fs2 kernel: [18035.998019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 25 07:10:23 fs2 kernel: [18035.998019] Stack: Sep 25 07:10:23 fs2 kernel: [18035.998019] ffffffffa0111f07 ffff8800b2aa1e70 ffffffffa011ffd8 0000000000000000 Sep 25 07:10:23 fs2 kernel: [18035.998019] 0000000000000000 0000000000000000 ffff880000000004 0000000000000032 Sep 25 07:10:23 fs2 kernel: [18035.998019] ffff8800a4b43ba0 ffff8800a4b43bf0 00000000626f7149 ffff8800995bd988 Sep 25 07:10:23 fs2 kernel: [18035.998019] Call Trace: Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa0111f07>] ? gfs2_dump_glock+0x1c7/0x360 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa011ffd8>] ? gfs2_inode_lookup+0x128/0x240 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81457b2b>] ? printk+0x4f/0x54 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81132e7d>] ? inode_init_always+0xed/0x1b0 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa01138bb>] ? gfs2_glock_nq+0x30b/0x3e0 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa011ffe0>] ? gfs2_inode_lookup+0x130/0x240 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa0109195>] ? gfs2_dirent_search+0xe5/0x1c0 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa010a4aa>] ? gfs2_dir_search+0x4a/0x80 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa01202f7>] ? gfs2_lookupi+0xf7/0x1f0 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa01203b9>] ? gfs2_lookupi+0x1b9/0x1f0 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa0121821>] ? gfs2_lookup+0x21/0xa0 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff811315e6>] ? d_alloc+0x76/0x90 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81124be3>] ? lookup_dcache+0xa3/0xd0 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff811246c4>] ? lookup_real+0x14/0x50 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81124c42>] ? __lookup_hash+0x32/0x50 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81459d64>] ? lookup_slow+0x3c/0xa2 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81126edf>] ? path_lookupat+0x23f/0x780 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa011f169>] ? gfs2_getxattr+0x79/0xa0 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8112744f>] ? filename_lookup+0x2f/0xc0 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff81125ccc>] ? getname_flags+0xbc/0x1a0 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8112a32c>] ? user_path_at_empty+0x5c/0xb0 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffffa01122c6>] ? gfs2_holder_uninit+0x16/0x30 [gfs2] Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8111f8fd>] ? cp_new_stat+0x10d/0x120 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8111facf>] ? vfs_fstatat+0x3f/0x90 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8111fc02>] ? SYSC_newstat+0x12/0x30 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8105ed51>] ? lg_local_lock+0x11/0x20 Sep 25 07:10:23 fs2 kernel: [18035.998019] [<ffffffff8145ff69>] ? system_call_fastpath+0x16/0x1b Sep 25 07:10:23 fs2 kernel: [18035.998019] Code: 31 f6 48 85 c0 74 0c 8b 50 04 48 c1 e2 05 48 8b 74 10 38 e9 28 ff ff ff 0f 1f 84 00 00 00 00 00 48 85 ff 74 23 89 f6 48 8d 04 f7 <48> 8b 40 08 48 85 c0 74 1c 48 8d 14 76 48 8d 14 d5 30 02 00 00 Sep 25 07:10:23 fs2 kernel: [18035.998019] RIP [<ffffffff81053bcb>] pid_task+0xb/0x40 Sep 25 07:10:23 fs2 kernel: [18035.998019] RSP <ffff8800a4a03a10> Sep 25 07:10:23 fs2 kernel: [18036.033702] ---[ end trace e5751bbc7d3a8d7c ]--- simple inspecfion of the gfs2 code showed this is caused by attempting a recursive lock. two gfs2_inode_lookups are visible in the trace, not sure that is strictly relevant though. this is followed by (probaby related) trace: Sep 25 07:10:24 fs2 kernel: [18036.162513] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070 Sep 25 07:10:24 fs2 kernel: [18036.164016] IP: [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2] Sep 25 07:10:24 fs2 kernel: [18036.164016] PGD 989a3067 PUD 9886a067 PMD 0 Sep 25 07:10:24 fs2 kernel: [18036.164016] Oops: 0000 [#2] SMP Sep 25 07:10:24 fs2 kernel: [18036.164016] Modules linked in: iptable_filter ip_tables x_tables gfs2 dm_mod dlm sctp libcrc32c ipv6 configfs virtio_net i6300esb Sep 25 07:10:24 fs2 kernel: [18036.164016] CPU: 1 PID: 25453 Comm: smbd Tainted: G D 3.10.7-gentoo #10 Sep 25 07:10:24 fs2 kernel: [18036.164016] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Sep 25 07:10:24 fs2 kernel: [18036.164016] task: ffff8800afca0d80 ti: ffff8800a4a02000 task.ti: ffff8800a4a02000 Sep 25 07:10:24 fs2 kernel: [18036.164016] RIP: 0010:[<ffffffffa011f7c6>] [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2] Sep 25 07:10:24 fs2 kernel: [18036.164016] RSP: 0018:ffff8800a4a03c08 EFLAGS: 00010286 Sep 25 07:10:24 fs2 kernel: [18036.164016] RAX: ffffffff8145f245 RBX: 0000000000000040 RCX: 0000000000000000 Sep 25 07:10:24 fs2 kernel: [18036.164016] RDX: ffff8800b5668f00 RSI: 0000000000000001 RDI: ffff8800a4b97ddc Sep 25 07:10:24 fs2 kernel: [18036.164016] RBP: ffff880099486e60 R08: 0000000000000061 R09: 0000000000000000 Sep 25 07:10:24 fs2 kernel: [18036.164016] R10: ff48ad3954b34002 R11: d09e94939e979e85 R12: ffff8800a4b97ddc Sep 25 07:10:24 fs2 kernel: [18036.164016] R13: 0000000000000001 R14: ffff8800a4b97df8 R15: ffff8800afca0d80 Sep 25 07:10:24 fs2 kernel: [18036.164016] FS: 00007f1846316740(0000) GS:ffff8800bfa80000(0000) knlGS:0000000000000000 Sep 25 07:10:24 fs2 kernel: [18036.164016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 25 07:10:24 fs2 kernel: [18036.164016] CR2: 0000000000000070 CR3: 000000009880c000 CR4: 00000000000007a0 Sep 25 07:10:24 fs2 kernel: [18036.164016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 25 07:10:24 fs2 kernel: [18036.164016] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 25 07:10:24 fs2 kernel: [18036.164016] Stack: Sep 25 07:10:24 fs2 kernel: [18036.164016] ffff8800994e0c00 ffffffff81125a8b ffff8800a4a03c18 ffff8800a4a03c18 Sep 25 07:10:24 fs2 kernel: [18036.164016] 0000000000000000 ffff8800bbba8d20 0000000800000003 0000000200000000 Sep 25 07:10:24 fs2 kernel: [18036.164016] ffffffff8145f245 ffffffff8112ff5e ffff8800a4a03e08 0000000000000007 Sep 25 07:10:24 fs2 kernel: [18036.164016] Call Trace: Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff81125a8b>] ? lookup_fast+0x1ab/0x2f0 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8145f245>] ? _raw_spin_lock+0x5/0x10 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8112ff5e>] ? dput+0x17e/0x220 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8112610a>] ? link_path_walk+0x23a/0x8b0 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff81126b9c>] ? path_init+0x30c/0x410 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff81126cf2>] ? path_lookupat+0x52/0x780 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8112744f>] ? filename_lookup+0x2f/0xc0 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff81125ccc>] ? getname_flags+0xbc/0x1a0 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8112a32c>] ? user_path_at_empty+0x5c/0xb0 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8111facf>] ? vfs_fstatat+0x3f/0x90 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8111fc02>] ? SYSC_newstat+0x12/0x30 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8111b420>] ? SyS_read+0x50/0xa0 Sep 25 07:10:24 fs2 kernel: [18036.164016] [<ffffffff8145ff69>] ? system_call_fastpath+0x16/0x1b Sep 25 07:10:24 fs2 kernel: [18036.164016] Code: c6 50 65 48 8b 04 25 80 b7 00 00 48 8b 90 40 02 00 00 4c 39 f3 75 14 eb 1a 0f 1f 40 00 48 3b 53 18 74 12 48 8b 1b 49 39 de 74 08 <48> 8b 43 30 a8 40 75 ea 31 db 4c 89 e7 e8 e8 78 f0 e0 66 90 45 Sep 25 07:10:24 fs2 kernel: [18036.164016] RIP [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2] Sep 25 07:10:24 fs2 kernel: [18036.164016] RSP <ffff8800a4a03c08> Sep 25 07:10:24 fs2 kernel: [18036.164016] CR2: 0000000000000070 Sep 25 07:10:24 fs2 kernel: [18036.218133] ---[ end trace e5751bbc7d3a8d7d ]--- afterwards the log is filled with "INFO: rcu_sched self-detected stall" and NMI-caused backtraces Is this a known-and-fixed bug? is there a way to prevent this? thanks Pavel Herrmann -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster