Re: RHEL 6 two-node cluster - nodes killing each other's cman

Ryan Mitchell <rmitchel@xxxxxxxxxx> · Fri, 27 Jul 2012 09:37:01 +1000

On 07/27/2012 01:44 AM, DIMITROV, TANIO wrote:
Hello,
I'm testing RHEL 6.2 cluster using CMAN.
It is a two-node cluster, no shared data. The problem is that if there is a connectivity problem between the nodes, each of them continues working as stand-alone - which is OK (no shared data, manual fencing). But when the connection comes back up the nodes kill each other's cman instances :

Jul 26 13:58:05.000 node1 corosync[15771]: cman killed by node 2 because we were killed by cman_tool or other application
Jul 26 13:58:05.000 node1 gfs_controld[15900]: cluster is down, exiting
Jul 26 13:58:05.000 node1 gfs_controld[15900]: daemon cpg_dispatch error 2
Jul 26 13:58:05.000 node1 dlm_controld[15848]: cluster is down, exiting

Can this be avoided somehow?

Thanks in advance!
Hi,

The error you see is the result of 2 clusters with existing state trying 
to merge.  Both nodes have previously been in a quorate cluster and 
therefore have existing cluster state.  At this time, CMAN and other 
tools do not support merging cluster states so that is why you hit this 
problem.  The solution is to implement fencing, because once a node is 
fenced and rebooted, it starts with no state (ie. not dirty) and can 
join the existing node (which has state, ie. dirty) successfully.

While it is possible to run clusters without fencing, behavior is 
designed with fencing in mind and you can end up with strange behavior 
like you've experienced when fencing doesn't trigger.  In some 
occasions, both nodes will kill each other and you'll lose both cluster 
nodes.  If this is really a critical system, I highly recommend fencing.

Regards,

Ryan Mitchell
Red Hat Global Support Services

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster