I'm having some problems with clvmd hanging on our 8-node cluster. Once hung, any lvm commands wait indefinitely. This normally happens when starting up the cluster or if multiple nodes reboot. After some experimentation I've managed to reproduce it consistently on a smaller 3-node test cluster by stopping clvmd on one node and then running vgscan on another. The vgscan will hang together with clvmd. Restarting clvmd on the stopped node doesn't wake it up. Once hung, an strace shows 3 clvmd threads, 2 waiting on futexes and one trying to read from /dev/misc/dlm_clvmd. All 3 threads wait indefinitely on these system calls. Here's the last part of the strace: [pid 2951] select(1024, [4 6], NULL, NULL, {90, 0}) = 1 (in [4], left {56, 190000}) [pid 2951] accept(4, {sa_family=AF_FILE, path=@}, [2]) = 5 [pid 2951] ioctl(6, 0x7805, 0) = 1 [pid 2951] select(1024, [4 5 6], NULL, NULL, {90, 0}) = 1 (in [5], left {90, 0}) [pid 2951] read(5, "3\0\0\0\0\0\0\0\0\0\0\0\v\0\0\0\0\4\4P_global\0\0", 4096) = 29 [pid 2951] futex(0x84d64f4, FUTEX_WAIT, 2, NULL <unfinished ...> P_global doesn't show up in /proc/cluster/dlm_locks at this point. Here's what I can get from dlm_debug: clvmd rebuilt 5 resources clvmd purge requests clvmd purged 0 requests clvmd mark waiting requests clvmd marked 0 requests clvmd purge locks of departed nodes clvmd purged 0 locks clvmd update remastered resources clvmd updated 0 resources clvmd rebuild locks clvmd rebuilt 0 locks clvmd recover event 22 done clvmd move flags 0,0,1 ids 11,22,22 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 22 finished clvmd move flags 1,0,0 ids 22,22,22 clvmd move flags 0,1,0 ids 22,23,22 clvmd move use event 23 clvmd recover event 23 clvmd add node 1 clvmd total nodes 3 clvmd rebuild resource directory clvmd rebuilt 5 resources clvmd purge requests clvmd purged 0 requests clvmd mark waiting requests clvmd marked 0 requests clvmd recover event 23 done clvmd move flags 0,0,1 ids 22,23,23 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 23 finished I'm running 4.6 with kernel-hugemem-2.6.9-67.0.7.EL, lvm2-cluster-2.02.27-2.el4_6.2 & dlm-kernel-hugemem-2.6.9-52.5. Has anyone else seen anything like this? Thanks, Robert -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster