Two node cluster not connected to each other

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, I have two node cluster (3.0.12) that works fine on clean start of both nodes, but when I fence/reboot one of the nodes it can't connect to the existing cluster and creates his own cluster(as far as I understand).
Here is the status of both node, after second node was rebooted:

[root@clit1-p ~]# clustat
Cluster Status for IT Cluster @ Tue Nov 29 05:19:48 2011
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 clit1-p                                                             1 Online, Local
 clit2-p                                                             2 Offline

[root@clit1-p ~]# corosync-quorumtool -s
Version:          1.2.3
Nodes:            1
Ring ID:          37680
Quorum type:      quorum_cman
Quorate:          Yes
[root@clit1-p ~]# cman_tool status
Version: 6.2.0
Config Version: 17
Cluster Name: IT Cluster
Cluster Id: 19690
Cluster Member: Yes
Cluster Generation: 37680
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1 
Active subsystems: 7
Flags: 2node
Ports Bound: 0 
Node name: clit1-p
Node ID: 1
Multicast addresses: 11.0.0.1
Node addresses: 10.0.0.1

[root@clit2-p ~]# clustat
Cluster Status for IT Cluster @ Tue Nov 29 05:21:36 2011
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 clit1-p                                                             1 Offline
 clit2-p                                                             2 Online, Local

[root@clit2-p ~]# corosync-quorumtool -s
Version:          1.2.3
Nodes:            1
Ring ID:          37848
Quorum type:      quorum_cman
Quorate:          Yes
[root@clit2-p ~]# cman_tool status
Version: 6.2.0
Config Version: 17
Cluster Name: IT Cluster
Cluster Id: 19690
Cluster Member: Yes
Cluster Generation: 38504
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1 
Active subsystems: 7
Flags: 2node
Ports Bound: 0 
Node name: clit2-p
Node ID: 2
Multicast addresses: 11.0.0.1
Node addresses: 10.0.0.2

Both nodes have same config file and only difference I can see is the Ring ID in corosync-quorumtool and Cluster Generation ID in cman_tool, that have to be the same numbers and same as the first node.
In the log of the second node I don't see any errors except this messages about every 30 secs:
Nov 27 03:31:19 clit2-p corosync[2484]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 27 03:31:19 clit2-p corosync[2484]:   [CPG   ] downlist received left_list: 0
Nov 27 03:31:19 clit2-p corosync[2484]:   [CPG   ] chosen downlist from node r(0) ip(192.168.127.85)
Nov 27 03:31:19 clit2-p corosync[2484]:   [MAIN  ] Completed service synchronization, ready to provide service.

And each time it happens new Cluster Generation ID changes.
When I restart cman on both node the Cluster is working fine again(both nodes in same cluster).

I guess that the problem is somewhere in corosync, but I have no idea what can cause it and setting it's log to debug doesn't help.
I would be glad to get some help or any direction how to find the problem.

Best regards!

--
David Golovan
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux