Hi, I have two node cluster (3.0.12) that works fine on clean start
of both nodes, but when I fence/reboot one of the nodes it can't
connect to the existing cluster and creates his own cluster(as far
as I understand). Here is the status of both node, after second node was rebooted: [root@clit1-p ~]# clustat Cluster Status for IT Cluster @ Tue Nov 29 05:19:48 2011 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ clit1-p 1 Online, Local clit2-p 2 Offline [root@clit1-p ~]# corosync-quorumtool -s Version: 1.2.3 Nodes: 1 Ring ID: 37680 Quorum type: quorum_cman Quorate: Yes [root@clit1-p ~]# cman_tool status Version: 6.2.0 Config Version: 17 Cluster Name: IT Cluster Cluster Id: 19690 Cluster Member: Yes Cluster Generation: 37680 Membership state: Cluster-Member Nodes: 1 Expected votes: 1 Total votes: 1 Node votes: 1 Quorum: 1 Active subsystems: 7 Flags: 2node Ports Bound: 0 Node name: clit1-p Node ID: 1 Multicast addresses: 11.0.0.1 Node addresses: 10.0.0.1 [root@clit2-p ~]# clustat Cluster Status for IT Cluster @ Tue Nov 29 05:21:36 2011 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ clit1-p 1 Offline clit2-p 2 Online, Local [root@clit2-p ~]# corosync-quorumtool -s Version: 1.2.3 Nodes: 1 Ring ID: 37848 Quorum type: quorum_cman Quorate: Yes [root@clit2-p ~]# cman_tool status Version: 6.2.0 Config Version: 17 Cluster Name: IT Cluster Cluster Id: 19690 Cluster Member: Yes Cluster Generation: 38504 Membership state: Cluster-Member Nodes: 1 Expected votes: 1 Total votes: 1 Node votes: 1 Quorum: 1 Active subsystems: 7 Flags: 2node Ports Bound: 0 Node name: clit2-p Node ID: 2 Multicast addresses: 11.0.0.1 Node addresses: 10.0.0.2 Both nodes have same config file and only difference I can see is the Ring ID in corosync-quorumtool and Cluster Generation ID in cman_tool, that have to be the same numbers and same as the first node. In the log of the second node I don't see any errors except this messages about every 30 secs: Nov 27 03:31:19 clit2-p corosync[2484]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Nov 27 03:31:19 clit2-p corosync[2484]: [CPG ] downlist received left_list: 0 Nov 27 03:31:19 clit2-p corosync[2484]: [CPG ] chosen downlist from node r(0) ip(192.168.127.85) Nov 27 03:31:19 clit2-p corosync[2484]: [MAIN ] Completed service synchronization, ready to provide service. And each time it happens new Cluster Generation ID changes. When I restart cman on both node the Cluster is working fine again(both nodes in same cluster). I guess that the problem is somewhere in corosync, but I have no idea what can cause it and setting it's log to debug doesn't help. I would be glad to get some help or any direction how to find the problem. Best regards! --
David Golovan |
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss