Celso K. Webber wrote:
Hello all,
Sorry if this question has been answered before, but I didn't find
anything in the archives.
We deployed a Red Hat Cluster Suite on a customer, and apparently
everything goes fine until there's a need for one node to fence the
other (for instance, we turn it off to test failover).
As usual for us, we configured the fencing using IPMI, which is
available on every modern branded server.
It seems that sometimes, one machine can't fence the other. Although we
can see the Cluster starting "ipmitool -I lanplus -H xxx -U xxx -P xxx
chassis power off", it times out while trying to power off the other
machine.
Have you tried the above command by itself to see if IPMI on the systems
responds correctly by shutting down?
The more incredible thing is that if, at this exact moment, we issue an
"ipmitool ... chassis power status" at the command line, it works ok
with the same node failing.
So I have a few questions:
* can a problem like this (fencing agent not being able to fence) cause
instability on the cluster? In our case, the clusters gets crazy even if
we reboot the failed node, it does join the cluster, but rgmanager never
gets started;
* has anyone faced this problem with IPMI? We have used IPMI as a fence
agent on tenths of implementations with Red Hat Cluster Suite, since
version 3, and we have never had this kind of problem. The servers in
question are Dell PowerEdges 2900, and there is a crossover cable
beetween both onboard #1 NICs of the server, so that we have a dedicated
network path for one machine turning off the other.
Thank you all for your support.
Regards,
Celso.
--
Subhendu Ghosh
Solutions Architect
Red Hat
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster