Greg Forte wrote:
And it still doesn't appear to work ... I can turn the outlets on and
off from the command line, but if I down the interface on a node, the
other node reports that it's removing the "failed" node from the
cluster, and that it's fencing the "failed" node, but the "failed" node
never gets shut down. Does this get logged somewhere besides
/var/log/messages, or is there a way to force it to be more verbose? If
I could see what command fenced is actually invoking that might help ...
Well, in case anyone is interested, I got fed up with having no decent
logging from any of these components, so I finally used tcpdump to
monitor the telnet connection between the non-failed node and the PDUs
as it tried to fence them ... and it turns out that fence_apc was trying
to turn each port ON twice, instead of OFF and then ON like it's
supposed to according to my configuration. The fault apparently lies
somewhere in ccsd or fenced, because the fence_apc script definitely
responds properly to the on|off|reboot options, both on the command line
and in the stdin like fenced uses.
I changed my cluster.conf so that it uses 'reboot' instead of 'off' and
'on' (e.g. the old conf looked like this:
<device name="FENCE1"
option="off" port="1"/>
<device name="FENCE2"
option="off" port="1"/>
<device name="FENCE1"
option="on" port="1"/>
<device name="FENCE2"
option="on" port="1"/>
and the new one looks like this:
<device name="FENCE1"
option="reboot" port="1"/>
<device name="FENCE2"
option="reboot" port="1"/>
and increased the reboot wait time on the PDUs to make sure it'd wait
long enough, and that SEEMS to work (once I remembered to turn off ccsd
before updating my cluster.conf by hand so that it didn't end up
replacing it with the old one immediately ;-)
Of course, I can't bring up any of the per-node fencing configuration
items in system-config-cluster anymore, but I think I mentioned that
previously - when I set them up through the gui it put "switch=" options
in each <device /> tag, and then when I shut down and restarted the gui
it complained that the file was formatted improperly. I removed those
options by hand, and then the gui worked again, but ever since the
fencing info hasn't been available ...
Any developers care to comment on any of this? I'm finding it really
tough to believe that this is a supported RedHat "product".
-g
Greg Forte
gforte@xxxxxxxx
IT - User Services
University of Delaware
302-831-1982
Newark, DE
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster