I have a really strange problem on one of my clusters. It exhibits all
signs of fencing being broken, but the fencing agents work when tested
manually, and I cannot find anything in syslog to even suggest that
fencing is being attempted by the surviving node (which just locks up on
GFS access until the other node returns).
Has anybody got any suggestions on how to troubleshoot this?
The relevant extract from my cluster.conf is:
<clusternodes>
<clusternode name="hades-cls" nodeid="1" votes="1">
...
<fence>
<method name = "1">
<device name = "hades-oob"/>
</method>
</fence>
...
</clusternode>
<clusternode name="persephone-cls" nodeid="2" votes="1">
...
<fence>
<method name = "1">
<device name ="persephone-oob"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_eric" ipaddr="10.1.254.251"
login="fence" passwd="some_password" name="hades-oob"/>
<fencedevice agent="fence_eric" ipaddr="10.1.254.252"
login="fence" passwd="some_password" name="persephone-oob"/>
</fencedevices>
...
I have a near identical setup on all my other clusters, so this is
somewhat baffling. What else could be relevant to this, specifically in
the context of no fencing attempts even showing up in the logs? I have
set up scores of RHCS clusters and never seen anything like this before.
The only unusual thing about this cluster is that I had to write a
bespoke fencing agent for the machines, but these test true when I use
them to down/reboot the machines.
TIA.
Gordan
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster