Robert Clark wrote: > I'm having some problems with clvmd hanging on our 8-node cluster. > Once hung, any lvm commands wait indefinitely. This normally happens > when starting up the cluster or if multiple nodes reboot. After some > experimentation I've managed to reproduce it consistently on a smaller > 3-node test cluster by stopping clvmd on one node and then running > vgscan on another. The vgscan will hang together with clvmd. Restarting > clvmd on the stopped node doesn't wake it up. > > Once hung, an strace shows 3 clvmd threads, 2 waiting on futexes and > one trying to read from /dev/misc/dlm_clvmd. All 3 threads wait > indefinitely on these system calls. Here's the last part of the strace: > > [pid 2951] select(1024, [4 6], NULL, NULL, {90, 0}) = 1 (in [4], left {56, 190000}) > [pid 2951] accept(4, {sa_family=AF_FILE, path=@}, [2]) = 5 > [pid 2951] ioctl(6, 0x7805, 0) = 1 > [pid 2951] select(1024, [4 5 6], NULL, NULL, {90, 0}) = 1 (in [5], left {90, 0}) > [pid 2951] read(5, "3\0\0\0\0\0\0\0\0\0\0\0\v\0\0\0\0\4\4P_global\0\0", 4096) = 29 > [pid 2951] futex(0x84d64f4, FUTEX_WAIT, 2, NULL <unfinished ...> > > P_global doesn't show up in /proc/cluster/dlm_locks at this point. > Here's what I can get from dlm_debug: > > clvmd rebuilt 5 resources > clvmd purge requests > clvmd purged 0 requests > clvmd mark waiting requests > clvmd marked 0 requests > clvmd purge locks of departed nodes > clvmd purged 0 locks > clvmd update remastered resources > clvmd updated 0 resources > clvmd rebuild locks > clvmd rebuilt 0 locks > clvmd recover event 22 done > clvmd move flags 0,0,1 ids 11,22,22 > clvmd process held requests > clvmd processed 0 requests > clvmd resend marked requests > clvmd resent 0 requests > clvmd recover event 22 finished > clvmd move flags 1,0,0 ids 22,22,22 > clvmd move flags 0,1,0 ids 22,23,22 > clvmd move use event 23 > clvmd recover event 23 > clvmd add node 1 > clvmd total nodes 3 > clvmd rebuild resource directory > clvmd rebuilt 5 resources > clvmd purge requests > clvmd purged 0 requests > clvmd mark waiting requests > clvmd marked 0 requests > clvmd recover event 23 done > clvmd move flags 0,0,1 ids 22,23,23 > clvmd process held requests > clvmd processed 0 requests > clvmd resend marked requests > clvmd resent 0 requests > clvmd recover event 23 finished > > I'm running 4.6 with kernel-hugemem-2.6.9-67.0.7.EL, > lvm2-cluster-2.02.27-2.el4_6.2 & dlm-kernel-hugemem-2.6.9-52.5. Has > anyone else seen anything like this? > Yes, we seem to have collected quite a few bugzillas on the subject! The fix is in CVS for LVM2. Packages are on their way I believe. -- Chrissie -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster