Good afternoon, I'm trying to form my first cluster of two nodes, using iLO fence devices. I need some help because I can't find what I've missed. My main problem is that the "service cman start" reboots the other node and I can't form the two nodes cluster. I'm using (at both nodea and nodeb, they are on the same VLAN and pings each other ok): [root@nodea ~]# uname -a Linux nodea 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux [root@nodea ~]# rpm -qa |grep cman cman-2.0.115-1.el5_4.9 [root@nodea ~]# cat /etc/cluster/cluster.conf (nodeb has the same file) <?xml version="1.0" ?> <cluster alias="VCluster" config_version="5" name="VCluster"> <fence_daemon post_fail_delay="0" post_join_delay="25"/> <clusternodes> <clusternode name="nodea" nodeid="1" votes="1"> <fence> <method name="1"> <device name="nodeaILO"/> </method> </fence> </clusternode> <clusternode name="nodeb" nodeid="2" votes="1"> <fence> <method name="1"> <device name="nodebILO"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_ilo" hostname="nodeacn" login="user" name="nodeaILO" passwd="hp"/> <fencedevice agent="fence_ilo" hostname="nodebcn" login="user" name="nodebILO" passwd="hp"/> </fencedevices> <rm> <failoverdomains/> <resources/> </rm> </cluster> When I start the cman service, it hangs up for some time at the "Starting fencing..." step and after those configured 25secs it fences nodeb and reboots it. [root@nodea ~]# service cman start Starting cluster: Loading modules... done Mounting configfs... done Starting ccsd... done Starting cman... done Starting daemons... done Starting fencing... done [ OK ] "nodeb" gets rebooted: [root@nodeb ~]# Broadcast message from root (Thu Apr 15 18:42:24 2010): The system is going down for system halt NOW! At the syslog I just can find: Apr 15 18:40:59 nodea ccsd[16930]: Initial status:: Quorate Apr 15 18:40:59 nodea openais[16936]: [CLM ] Members Left: Apr 15 18:40:59 nodea openais[16936]: [CLM ] Members Joined: Apr 15 18:40:59 nodea openais[16936]: [CLM ] CLM CONFIGURATION CHANGE Apr 15 18:41:00 nodea openais[16936]: [CLM ] New Configuration: Apr 15 18:41:00 nodea openais[16936]: [CLM ] r(0) ip(10.192.16.42) Apr 15 18:41:00 nodea openais[16936]: [CLM ] Members Left: Apr 15 18:41:00 nodea openais[16936]: [CLM ] Members Joined: Apr 15 18:41:00 nodea openais[16936]: [CLM ] r(0) ip(10.192.16.42) Apr 15 18:41:00 nodea openais[16936]: [SYNC ] This node is within the primary component and will provide service. Apr 15 18:41:00 nodea openais[16936]: [TOTEM] entering OPERATIONAL state. Apr 15 18:41:00 nodea openais[16936]: [CMAN ] quorum regained, resuming activity Apr 15 18:41:00 nodea openais[16936]: [CLM ] got nodejoin message 10.192.16.42 Apr 15 18:42:11 nodea fenced[16955]: nodeb not a cluster member after 25 sec post_join_delay Apr 15 18:42:11 nodea fenced[16955]: fencing node "nodeb" Apr 15 18:42:23 nodea fenced[16955]: fence "nodeb" success [root@nodea ~]# clustat Cluster Status for VCluster @ Thu Apr 15 18:55:23 2010 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ nodea 1 Online, Local nodeb 2 Offline Then when nodeb starts again, I try to start cman there to join the cluster... but it again fences "nodea": [root@nodeb ~]# clustat Could not connect to CMAN: No such file or directory [root@nodeb ~]# service cman start Starting cluster: Loading modules... done Mounting configfs... done Starting ccsd... done Starting cman... done Starting qdiskd... done Starting daemons... done Starting fencing... (wait for 25secs again) done [ OK ] "nodea" gets rebooted: [root@nodea ~]# Broadcast message from root (Thu Apr 15 18:58:40 2010): The system is going down for system halt NOW! Apr 15 18:57:31 nodeb openais[11789]: [CLM ] Members Joined: Apr 15 18:57:31 nodeb openais[11789]: [CLM ] r(0) ip(10.192.16.44) Apr 15 18:57:31 nodeb openais[11789]: [SYNC ] This node is within the primary component and will provide service. Apr 15 18:57:31 nodeb openais[11789]: [TOTEM] entering OPERATIONAL state. Apr 15 18:57:31 nodeb openais[11789]: [CMAN ] quorum regained, resuming activity Apr 15 18:57:31 nodeb openais[11789]: [CLM ] got nodejoin message 10.192.16.44 Apr 15 18:57:34 nodeb qdiskd[10323]: <info> Quorum Daemon Initializing Apr 15 18:57:34 nodeb qdiskd[10323]: <crit> Initialization failed Apr 15 18:58:42 nodeb fenced[11816]: nodea not a cluster member after 25 sec post_join_delay Apr 15 18:58:42 nodeb fenced[11816]: fencing node "nodea" Apr 15 18:58:54 nodeb fenced[11816]: fence "nodea" success And I can't get the two nodes, joining the cluster... I guess I'm missing something at the cluster.conf file??? I can't find what I'm making wrong. Thanks for any help! Alex Re |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster