Most likely the multicast packet communication between the 2 nodes is not getting through your network.
linux-cluster-bounces@xxxxxxxxxx wrote on 04/15/2010 01:05:01 PM:
> Good afternoon,
> I'm trying to form my first cluster of two nodes, using iLO fence
> devices. I need some help because I can't find what I've missed.
> My main problem is that the "service cman start" reboots the other
> node and I can't form the two nodes cluster.
> I'm using (at both nodea and nodeb, they are on the same VLAN and
> pings each other ok):
>
> [root@nodea ~]# uname -a
> Linux nodea 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010
> x86_64 x86_64 x86_64 GNU/Linux
> [root@nodea ~]# rpm -qa |grep cman
> cman-2.0.115-1.el5_4.9
>
> [root@nodea ~]# cat /etc/cluster/cluster.conf (nodeb has the same file)
> <?xml version="1.0" ?>
> <cluster alias="VCluster" config_version="5" name="VCluster">
> <fence_daemon post_fail_delay="0" post_join_delay="25"/>
> <clusternodes>
> <clusternode name="nodea" nodeid="1" votes="1">
> <fence>
> <method name="1">
> <device name="nodeaILO"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="nodeb" nodeid="2" votes="1">
> <fence>
> <method name="1">
> <device name="nodebILO"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_ilo" hostname="nodeacn"
> login="user" name="nodeaILO" passwd="hp"/>
> <fencedevice agent="fence_ilo" hostname="nodebcn"
> login="user" name="nodebILO" passwd="hp"/>
> </fencedevices>
> <rm>
> <failoverdomains/>
> <resources/>
> </rm>
> </cluster>
>
> When I start the cman service, it hangs up for some time at the
> "Starting fencing..." step and after those configured 25secs it
> fences nodeb and reboots it.
> [root@nodea ~]# service cman start
> Starting cluster:
> Loading modules... done
> Mounting configfs... done
> Starting ccsd... done
> Starting cman... done
> Starting daemons... done
> Starting fencing... done
> [ OK ]
>
> "nodeb" gets rebooted:
> [root@nodeb ~]#
> Broadcast message from root (Thu Apr 15 18:42:24 2010):
>
> The system is going down for system halt NOW!
>
> At the syslog I just can find:
> Apr 15 18:40:59 nodea ccsd[16930]: Initial status:: Quorate
> Apr 15 18:40:59 nodea openais[16936]: [CLM ] Members Left:
> Apr 15 18:40:59 nodea openais[16936]: [CLM ] Members Joined:
> Apr 15 18:40:59 nodea openais[16936]: [CLM ] CLM CONFIGURATION CHANGE
> Apr 15 18:41:00 nodea openais[16936]: [CLM ] New Configuration:
> Apr 15 18:41:00 nodea openais[16936]: [CLM ] r(0) ip(10.192.16.42)
> Apr 15 18:41:00 nodea openais[16936]: [CLM ] Members Left:
> Apr 15 18:41:00 nodea openais[16936]: [CLM ] Members Joined:
> Apr 15 18:41:00 nodea openais[16936]: [CLM ] r(0) ip(10.192.16.42)
> Apr 15 18:41:00 nodea openais[16936]: [SYNC ] This node is within
> the primary component and will provide service.
> Apr 15 18:41:00 nodea openais[16936]: [TOTEM] entering OPERATIONAL state.
> Apr 15 18:41:00 nodea openais[16936]: [CMAN ] quorum regained,
> resuming activity
> Apr 15 18:41:00 nodea openais[16936]: [CLM ] got nodejoin message
> 10.192.16.42
> Apr 15 18:42:11 nodea fenced[16955]: nodeb not a cluster member
> after 25 sec post_join_delay
> Apr 15 18:42:11 nodea fenced[16955]: fencing node "nodeb"
> Apr 15 18:42:23 nodea fenced[16955]: fence "nodeb" success
>
> [root@nodea ~]# clustat
> Cluster Status for VCluster @ Thu Apr 15 18:55:23 2010
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> nodea
> 1 Online, Local
> nodeb 2 Offline
>
> Then when nodeb starts again, I try to start cman there to join the
> cluster... but it again fences "nodea":
> [root@nodeb ~]# clustat
> Could not connect to CMAN: No such file or directory
> [root@nodeb ~]# service cman start
> Starting cluster:
> Loading modules... done
> Mounting configfs... done
> Starting ccsd... done
> Starting cman... done
> Starting qdiskd... done
> Starting daemons... done
> Starting fencing... (wait for 25secs again) done
> [ OK ]
> "nodea" gets rebooted:
> [root@nodea ~]#
> Broadcast message from root (Thu Apr 15 18:58:40 2010):
>
> The system is going down for system halt NOW!
>
> Apr 15 18:57:31 nodeb openais[11789]: [CLM ] Members Joined:
> Apr 15 18:57:31 nodeb openais[11789]: [CLM ] r(0) ip(10.192.16.44)
> Apr 15 18:57:31 nodeb openais[11789]: [SYNC ] This node is within
> the primary component and will provide service.
> Apr 15 18:57:31 nodeb openais[11789]: [TOTEM] entering OPERATIONAL state.
> Apr 15 18:57:31 nodeb openais[11789]: [CMAN ] quorum regained,
> resuming activity
> Apr 15 18:57:31 nodeb openais[11789]: [CLM ] got nodejoin message
> 10.192.16.44
> Apr 15 18:57:34 nodeb qdiskd[10323]: <info> Quorum Daemon Initializing
> Apr 15 18:57:34 nodeb qdiskd[10323]: <crit> Initialization failed
> Apr 15 18:58:42 nodeb fenced[11816]: nodea not a cluster member
> after 25 sec post_join_delay
> Apr 15 18:58:42 nodeb fenced[11816]: fencing node "nodea"
> Apr 15 18:58:54 nodeb fenced[11816]: fence "nodea" success
>
> And I can't get the two nodes, joining the cluster...
> I guess I'm missing something at the cluster.conf file??? I can't
> find what I'm making wrong.
>
> Thanks for any help!
>
> Alex Re--
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster