Hi,
I've got most of my cluster pretty much sorted out, apart from kicking
nodes from the cluster when they fail.
Is there a way to make the node-kicking automated? I have 4 nodes. They
are sharing 2 GFS file systems, a root FS and a data FS. If I pull the
network cable from one of them, or just power it off, the rest of the
cluster nodes just stop. The only way to get them to start responding
again is to bring the missing node back, even if there are still enough
nodes to maintain quorum (3 nodes out of 4).
Can anyone suggest a way around this? How can I make the 3 remaining nodes
just kick the missing node out of the cluster and DLM group (possibly
after some timeout, e.g. 10 seconds) and resume operation until the node
rejoins?
This may or may not be related to the fact that I'm running a shared GFS
root, but any pointers would be welcome.
Thanks.
Gordan
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster