Hi David,
I got quite some trouble with clvmd on corosync 2.3.0/dlm; apparently a
nonfunctional clvmd in the cluster can block all others (kern.log states
clvmd stuck for >120s in some dlm call). I tried to clean things up
killing -9 clvmd, but it will remain on state D or Z. Unfortunately, it
seems that those zombies still keep some dlm stuff locked. When I
restart corosync on a node and dlm_controld -D on it, I see "found
uncontrolled lockspace, tell corosync to remove nodeid from cluster".
Well, that's fine for the first step, but how about cleaning up the dlm
lockspace? dlm_tool leave <lockspace> hangs as well (sometimes it just
fails with error 49). The comment in dlm_controld/action.c isn't too
satisfactory: need reboot, not funny if a whole cluster is affected. I'd
really appreciate a way to manually clean old lockspaces. I'd presume
that an uncontrolled lockspace on an isolated node should be easily
removable...
Regards
Andreas
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/