On 07/27/2012 01:44 AM, DIMITROV, TANIO wrote:
Hello,
I'm testing RHEL 6.2 cluster using CMAN.
It is a two-node cluster, no shared data. The problem is that if there is a connectivity problem between the nodes, each of them continues working as stand-alone - which is OK (no shared data, manual fencing). But when the connection comes back up the nodes kill each other's cman instances :
Jul 26 13:58:05.000 node1 corosync[15771]: cman killed by node 2 because we were killed by cman_tool or other application
Jul 26 13:58:05.000 node1 gfs_controld[15900]: cluster is down, exiting
Jul 26 13:58:05.000 node1 gfs_controld[15900]: daemon cpg_dispatch error 2
Jul 26 13:58:05.000 node1 dlm_controld[15848]: cluster is down, exiting
Can this be avoided somehow?
Thanks in advance!
Hi,
The error you see is the result of 2 clusters with existing state trying
to merge. Both nodes have previously been in a quorate cluster and
therefore have existing cluster state. At this time, CMAN and other
tools do not support merging cluster states so that is why you hit this
problem. The solution is to implement fencing, because once a node is
fenced and rebooted, it starts with no state (ie. not dirty) and can
join the existing node (which has state, ie. dirty) successfully.
While it is possible to run clusters without fencing, behavior is
designed with fencing in mind and you can end up with strange behavior
like you've experienced when fencing doesn't trigger. In some
occasions, both nodes will kill each other and you'll lose both cluster
nodes. If this is really a critical system, I highly recommend fencing.
Regards,
Ryan Mitchell
Red Hat Global Support Services
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster