Re: Clusterbehaviour if one node is not reachable & fenceable any longer?

Digimer <lists@xxxxxxxxxx> · Wed, 29 Jan 2014 10:43:48 -0500

On 29/01/14 10:14 AM, Nicolas Kukolja wrote:
Hello,

I have a cluster with three nodes (rhel 5.5) and every server has an
ipmilan-module configured as fencing device in my cluster-config.
Now, if one of the nodes is not reachable and its fencing device is not
reachable, too, then the other two nodes try to fence this node again
and again... without stopping it.

Only when this node is reachable (& fenceable) again, the fencing
proceeds sucessfully and the cluster service moves to another node.

Why does the service not move to another node earlier? I think, its a
common error scenario, that one node and its fencing device are not
reachable maybe due to power problems e.g.
How do I have to change the cluster configuration to retrieve my
expected behaviour?

Thanks in advance for any suggestions...

Kind regards,
Nicolas

This behaviour is expected and by design. The healthy nodes can't safely 
recover until they know what state the lost node is in. The cluster is 
not allowed to simply assume that the lost node is dead (no way to tell 
"disconnected but working" from "smouldering pile of rubble").

The way I deal with this is a second fence method. I use a pair of 
switched PDUs behind each node (one PDU for the first PSU in each node 
and the second PDU for the second PSU in each node). This way, if IPMI 
fencing fails, the nodes will connect to the PDUs and cut the power to 
the lost node, thus ensuring it's off and allowing prompt recovery of 
services.

This might help:

* https://alteeve.ca/w/AN!Cluster_Tutorial_2#Why_Switched_PDUs.3F
* https://alteeve.ca/w/AN!Cluster_Tutorial_2#A_Map.21
* https://alteeve.ca/w/AN!Cluster_Tutorial_2#Using_the_Fence_Devices

Cheers

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster