My understanding is node fenced while rebooting. I suggest you to look info fencing logs as well. If your fencing logs not in detail use following in cluster.conf to enable logging
Thanks<logging> <logging_daemon name="fenced" debug="on"/> </logging>
On Thu, Nov 7, 2013 at 5:34 PM, Yuriy Demchenko <demchenko.ya@xxxxxxxxx> wrote:
Hi,
I'm trying to set up 3-node cluster (2 nodes + 1 standby node for quorum) with cman+pacemaker stack, everything according this quickstart article: http://clusterlabs.org/quickstart-redhat.html
Cluster starts, all nodes see each other, quorum gained, stonith working, but I've run into problem with cman: node cant join cluster after reboot - cman starts and cman_tool nodes reports only that node as cluster-member, while on other 2 nodes it reports 2 nodes as cluster-member and 3rd as offline. cman stop/start/restart on the problem node does no effect - it still can see only itself, but if i'll do cman restart on one of working nodes - everything goes back to normal, all 3 nodes joins the cluster and subsequent cman service restarts on any nodes works fine - node lefts cluster and rejoins sucessfully. But again - only till node OS reboot.
For example:
[1] Working cluster:
[root@node-1 ~]# cman_tool nodesPicture is same on all 3 nodes (except for node name and id) - same cluster name, cluster id, multicast addres.
Node Sts Inc Joined Name
1 M 592 2013-11-07 15:20:54 node-1.spb.stone.local
2 M 760 2013-11-07 15:20:54 node-2.spb.stone.local
3 M 760 2013-11-07 15:20:54 vnode-3.spb.stone.local
[root@node-1 ~]# cman_tool status
Version: 6.2.0
Config Version: 10
Cluster Name: ocluster
Cluster Id: 2059
Cluster Member: Yes
Cluster Generation: 760
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 7
Flags:
Ports Bound: 0
Node name: node-1.spb.stone.local
Node ID: 1
Multicast addresses: 239.192.8.19
Node addresses: 192.168.220.21
[2] I've put node-1 into reboot. After reboot complete, "cman_tool nodes" on node-2 and vnode-3 shows this:
Node Sts Inc Joined NameBut, on rebooted node-1 it shows this:
1 X 760 node-1.spb.stone.local
2 M 588 2013-11-07 15:11:23 node-2.spb.stone.local
3 M 760 2013-11-07 15:20:54 vnode-3.spb.stone.local
[root@node-2 ~]# cman_tool status
Version: 6.2.0
Config Version: 10
Cluster Name: ocluster
Cluster Id: 2059
Cluster Member: Yes
Cluster Generation: 764
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 7
Flags:
Ports Bound: 0
Node name: node-2.spb.stone.local
Node ID: 2
Multicast addresses: 239.192.8.19
Node addresses: 192.168.220.22
Node Sts Inc Joined Nameso, same cluster name, cluster id, multicast address - but it cant see other nodes. And there are nothing in /var/log/messages and /var/log/cluster/corosync.log on other two nodes - they seem not notice node-1 coming back online at all, last records about node-1 leaving cluster.
1 M 764 2013-11-07 15:49:01 node-1.spb.stone.local
2 X 0 node-2.spb.stone.local
3 X 0 vnode-3.spb.stone.local
[root@node-1 ~]# cman_tool status
Version: 6.2.0
Config Version: 10
Cluster Name: ocluster
Cluster Id: 2059
Cluster Member: Yes
Cluster Generation: 776
Membership state: Cluster-Member
Nodes: 1
Expected votes: 3
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 7
Flags:
Ports Bound: 0
Node name: node-1.spb.stone.local
Node ID: 1
Multicast addresses: 239.192.8.19
Node addresses: 192.168.220.21
[3] If now i do "service cman restart" on node-2 or vnode-3 - everything goes back to normal operation as in [1]
in logs it shows as node-2 leaving cluster (service stop) and simultaneously joining of both node-2 and node-1 (service start)
Nov 7 11:47:06 vnode-3 corosync[26692]: [QUORUM] Members[2]: 2 3
Nov 7 11:47:06 vnode-3 corosync[26692]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 7 11:47:06 vnode-3 kernel: dlm: closing connection to node 1
Nov 7 11:47:06 vnode-3 corosync[26692]: [CPG ] chosen downlist: sender r(0) ip(192.168.220.22) ; members(old:3 left:1)
Nov 7 11:47:06 vnode-3 corosync[26692]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 7 11:53:28 vnode-3 corosync[26692]: [QUORUM] Members[1]: 3
Nov 7 11:53:28 vnode-3 corosync[26692]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 7 11:53:28 vnode-3 corosync[26692]: [CPG ] chosen downlist: sender r(0) ip(192.168.220.14) ; members(old:2 left:1)
Nov 7 11:53:28 vnode-3 corosync[26692]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 7 11:53:28 vnode-3 kernel: dlm: closing connection to node 2
Nov 7 11:53:30 vnode-3 corosync[26692]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[2]: 1 3
Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[2]: 1 3
Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3
Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3
Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3
Nov 7 11:53:30 vnode-3 corosync[26692]: [CPG ] chosen downlist: sender r(0) ip(192.168.220.21) ; members(old:1 left:0)
Nov 7 11:53:30 vnode-3 corosync[26692]: [MAIN ] Completed service synchronization, ready to provide service.
I've set up such cluster before in quite same configuration and never had any problems, but now I'm completely stuck.
So, what is wrong with my cluster and how to fix it?
OS Centos 6.4 with lastest updates, firewall disabled, selinux permissive, all 3 nodes inside same network. Multicast working - checked with omping.
cman.x86_64 3.0.12.1-49.el6_4.2 @centos6-updates
corosync.x86_64 1.4.1-15.el6_4.1 @centos6-updates
pacemaker.x86_64 1.1.10-1.el6_4.4 @centos6-updates
cluster.conf is in attach
--
Yuriy Demchenko
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
http://linuxmantra.com
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster