On Sun, Jul 01, 2007 at 02:17:48PM +0300, Janne Peltonen wrote: > Hi! > > Sometimes, when I have cleanly shut down rgmanager on one node, and the > services have nicely migrated to other nodes, trying to start rgmanager > fails. Trying to access /dev/misc/dlm_rgmanager results in "No such > device". clurgmgrd concludes that locks are not working and exits. > (See strace output attached.) Interesting. After the one node with failing rgmanagers was shot in the head (there were no log lines about fencing, only two about deferring fencing to an earlier node), the fenced node was left in 'off' state, and, well, the other nodes had their services left running (but rgmanagers apparently stuck - no more status checks an no response to the clustat command). The node that (apparently, since there is no log entry) did the fencing: [jmmpelto@pcn2 ~]$ sudo cman_tool services type level name id state fence 0 default 00010001 FAIL_ALL_STOPPED [1 2 3 4 100] dlm 1 clvmd 00010002 FAIL_ALL_STOPPED [1 2 3 4 100] dlm 1 rgmanager 00020002 FAIL_ALL_STOPPED [1 2 3 4] Other nodes with rgmanager running: [jmmpelto@pcn3 ~]$ sudo cman_tool services type level name id state fence 0 default 00010001 FAIL_START_WAIT [2 3 4 100] dlm 1 clvmd 00010002 FAIL_ALL_STOPPED [1 2 3 4 100] dlm 1 rgmanager 00020002 FAIL_ALL_STOPPED [1 2 3 4] The fifth node without rgmanager: [jmmpelto@pcnm ~]$ sudo cman_tool services type level name id state fence 0 default 00010001 FAIL_START_WAIT [2 3 4 100] dlm 1 clvmd 00010002 FAIL_ALL_STOPPED [1 2 3 4 100] Er. What might be up. --Janne -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster