Re: Clusterbehaviour if one node is not reachable & fenceable any longer?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 29/01/14 10:14 AM, Nicolas Kukolja wrote:
Hello,

I have a cluster with three nodes (rhel 5.5) and every server has an
ipmilan-module configured as fencing device in my cluster-config.
Now, if one of the nodes is not reachable and its fencing device is not
reachable, too, then the other two nodes try to fence this node again
and again... without stopping it.

Only when this node is reachable (& fenceable) again, the fencing
proceeds sucessfully and the cluster service moves to another node.

Why does the service not move to another node earlier? I think, its a
common error scenario, that one node and its fencing device are not
reachable maybe due to power problems e.g.
How do I have to change the cluster configuration to retrieve my
expected behaviour?

Thanks in advance for any suggestions...

Kind regards,
Nicolas

This behaviour is expected and by design. The healthy nodes can't safely recover until they know what state the lost node is in. The cluster is not allowed to simply assume that the lost node is dead (no way to tell "disconnected but working" from "smouldering pile of rubble").

The way I deal with this is a second fence method. I use a pair of switched PDUs behind each node (one PDU for the first PSU in each node and the second PDU for the second PSU in each node). This way, if IPMI fencing fails, the nodes will connect to the PDUs and cut the power to the lost node, thus ensuring it's off and allowing prompt recovery of services.

This might help:

* https://alteeve.ca/w/AN!Cluster_Tutorial_2#Why_Switched_PDUs.3F
* https://alteeve.ca/w/AN!Cluster_Tutorial_2#A_Map.21
* https://alteeve.ca/w/AN!Cluster_Tutorial_2#Using_the_Fence_Devices

Cheers

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux