FYI.
The issue was resolved with help from Redhat support.
If use ip but name of cluster node, it will not fence node that has lowest id.
Resolution,
Put one mapping in /etc/hosts.
Use name but ip for cluster node.
BRs,
Colin
-----Original Message-----
From: ext Wang2, Colin (NSN - CN/Cheng Du) <colin.wang2@xxxxxxx>
Reply-To: linux clustering <linux-cluster@xxxxxxxxxx>
To: linux-cluster@xxxxxxxxxx
Subject: RHCS not fence 2nd node in 2 nodes cluster
Date: Wed, 04 Nov 2009 15:40:34 +0800
Hi Gurus, I am working on setup 2 nodes cluster, and environment is, Hardware, IBM BladeCenter with 2 LS42( AMD Opteron Quad Code 2356 CPU, 16GB Memory). Storage, EMC CX3-20f Storage Switch: Brocade 4GB 20 ports switch in IBM bladecenter. Network Switch: Cisco Switch module in IBM Bladecenter. Software, Redhat EL 5.3 x86_64, 2.6.18-128.el5 Redhat Cluster Suite 5.3. This is 2 nodes cluster, and my problem is that, - When poweroff 1st node with command "halt -fp", 2nd node can fence 1st node and take over services. - When poweroff 2nd node with command "halt -fp", 1st node can't fence 2nd node and can't take over services. fence_tool dump contents, ----for successful test dump read: Success 1257305495 our_nodeid 2 our_name 198.18.9.34 1257305495 listen 4 member 5 groupd 7 1257305511 client 3: join default 1257305511 delay post_join 3s post_fail 0s 1257305511 clean start, skipping initial nodes 1257305511 setid default 65538 1257305511 start default 1 members 1 2 1257305511 do_recovery stop 0 start 1 finish 0 1257305511 first complete list empty warning 1257305511 finish default 1 1257305611 stop default 1257305611 start default 3 members 2 1257305611 do_recovery stop 1 start 3 finish 1 1257305611 add node 1 to list 1 1257305611 node "198.18.9.33" not a cman member, cn 1 1257305611 node "198.18.9.33" has not been fenced 1257305611 fencing node 198.18.9.33 1257305615 finish default 3 1257305658 client 3: dump ----For failed test dump read: Success 1257300282 our_nodeid 1 our_name 198.18.9.33 1257300282 listen 4 member 5 groupd 7 1257300297 client 3: join default 1257300297 delay post_join 3s post_fail 0s 1257300297 clean start, skipping initial nodes 1257300297 setid default 65538 1257300297 start default 1 members 1 2 1257300297 do_recovery stop 0 start 1 finish 0 1257300297 first complete list empty warning 1257300297 finish default 1 1257303721 stop default 1257303721 start default 3 members 1 1257303721 do_recovery stop 1 start 3 finish 1 1257303721 add node 2 to list 1 1257303721 averting fence of node 198.18.9.34 1257303721 finish default 3 1257303759 client 3: dump I think it was caused by "averting fence of node 198.18.9.34", but why it advert fence? Could you help me out? Thanks in advance. This cluster.conf for reference. <?xml version="1.0"?> <cluster config_version="14" name="x"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="198.18.9.33" nodeid="1" votes="1"> <fence> <method name="1"> <device blade="13" name="mm1"/> </method> </fence> </clusternode> <clusternode name="198.18.9.34" nodeid="2" votes="1"> <fence> <method name="1"> <device blade="14" name="mm1"/> </method> </fence> </clusternode> </clusternodes> <quorumd device="/dev/sdb1" interval="2" tko="7" votes="1"> <heuristic interval="3" program="ping 198.18.9.61 -c1 -t2" score="10"/> </quorumd> <totem token="27000"/> <cman expected_votes="3" two_node="0" quorum_dev_poll="23000"> <multicast addr="239.192.148.6"/> </cman> <fencedevices> <fencedevice agent="fence_bladecenter_ssh" ipaddr="x" login="x" name="mm1" passwd="x"/> </fencedevices> BRs, Colin -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster