hello, i actually run a 2 node RH5.1 cluster with openais 0.80.3-13
and cman 2.0.80-1 both nodes are hosted on VMware ESX3.02 servers,
fencing works fine but here’s my issue : whenever I simulate the failure of a node (shut Eth0
or hard reboot), the node is fenced but it can never rejoin the cluster again. Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
entering COMMIT state. Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
entering RECOVERY state. Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
position [0] member 10.148.46.50: Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
previous ring seq 7692 rep 10.148.46.50 Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
aru c high delivered c received flag 1 Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
position [1] member 10.148.46.51: Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
previous ring seq 7688 rep 10.148.46.51 Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
aru b high delivered b received flag 1 Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
Did not need to originate any messages in recovery. Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
Sending initial ORF token Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] CLM CONFIGURATION CHANGE Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] New Configuration: Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] r(0) ip(10.148.46.50) Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] Members Left: Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] Members Joined: Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] CLM CONFIGURATION CHANGE Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] New Configuration: Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] r(0) ip(10.148.46.50) Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] r(0) ip(10.148.46.51) Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] Members Left: Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] Members Joined: Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM
] r(0) ip(10.148.46.51) Mar 17 14:24:32 VMClutest01 openais[1941]: [SYNC ]
This node is within the primary component and will provide service. Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM]
entering OPERATIONAL state. Mar 17 14:24:32 VMClutest01 openais[1941]: [ is there anything to do after a failure in one node to
make it rejoing the cluster in a « clean » state ? If I try to cleanly restart note 2 with “shutdown
–r now” it hangs on stopping cluster services if I hard reboot node 2 it can never rejoin cluster and
log is the same as above. my cluster.conf <?xml version="1.0"?> <cluster alias="TestClu01"
config_version="9" name="TestClu01"><fence_daemon
clean_start="0" post_fail_delay="0"
post_join_delay="60"/> <clusternodes> <clusternode
name="VMClutest01" nodeid="1" votes="1"> <fence><method
name="FENCESX"><device name="ESX01"/></method> </fence> </clusternode> <clusternode
name="VMClutest02" nodeid="2" votes="1"> <fence><method
name="FENCESX"><device
name="ESX02"/></method> </fence> </clusternode> </clusternodes> <cman
expected_votes="1" two_node="1"/> <fencedevices> <fencedevice
name="ESX01" agent="fence_vi3"
ipaddr="10.148.45.206" port="VMClutest01" login=""
passwd=" "/> <fencedevice
name="ESX02" agent="fence_vi3"
ipaddr="10.148.45.206" port="VMClutest02" login=""
passwd=" "/> </fencedevices> <rm> <failoverdomains> <failoverdomain
name="AppCluster" ordered="0" restricted="0"> <failoverdomainnode
name="VMClutest01" priority="1"/> <failoverdomainnode
name="VMClutest02" priority="1"/> </failoverdomain> </failoverdomains> <resources> <ip
address="10.148.46.55" monitor_link="1"/> </resources> <service
autostart="1" domain="AppCluster" exclusive="0"
name="AppServer" recovery="restart"> <ip
ref="10.148.46.55"/> </service> </rm> <totem consensus="4800"
join="1000" token="5000"
token_retransmits_before_loss_const="20"/> </cluster> any idea ? Mathieu |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster