Eric Ritchie wrote:
I sometimes run into an issue when a node in my 2-node cluster is
rebooting and hangs on fenced. It seems it can't communicate with the
other node and after the post_join_delay, it fences the other node. This
happened again today, and when the second node rebooted after the fence,
they were in a split-brain configuration.
I saw in the cluster faq, in the cman section, question 6 that the
cluster communication network should be the same network as the fencing
device. I think this may be my problem but I don't understand why. I'm
using HP iLo for fencing and I setup cross-connect cables for the
cluster communication between the 2 nodes. Why would having cluster
communication and fencing on different networks be an issue?
Thanks for your time
Having distinct heartbeat and fencing networks creates the possibility of race
condition, which you seem to be running into.
The cluster communication may not have stabilized in the post_join_delay time
frame due to any number of issues including network outage. In this case
fencing would fail from the node starting up as it is the same path to fence
device as to cluster member.
By separating the two - fence can succeed while cluster communication fails.
Recommendation would be for cluster communication and iLO reachability to be
through the same NIC on the host.
-regards
Subhendu
--
Subhendu Ghosh
Solutions Architect
Red Hat
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster