Eric Kerin wrote:
> Greg,
>
> I'm using the fence_apc agent on my cluster with APC 7900s, and fencing
> is working perfectly for me, and has for more than 6 months now.
Thanks, Eric, but the fence_apc script is definitely not the issue - I
had to make a couple of minor changes to fence_apc's regexps, and it now
works both with command-line options and passing arguments through
stdin. This doesn't explain why the cluster conf doesn't work when it
has "off" and then "on" as set up by system-config-cluster (and it did
that itself, all I did was configure the ip address and login for the
fence devices, and tell it which ports to use), but it does work when I
make the change to 'reboot' as described in my previous message (this is
the default option, anyway, which I assume is why yours works with no
"option=" option).
You can test that the cluster is configured correctly to fence a node by
running "fence_node <nodename>" This will use the cluster's config file
to fence the node, ensuring that all config settings are correct.
Actually, that doesn't seem to work for me - no matter what nodename I
specify, and regardless of whether I run it on the node I'm trying to
fence or the other node (it's a two-node cluster), it comes back with
"Fence of 'hostname' was unsuccessful." I suspect this is because it's
a two-node cluster so fenced doesn't want to let me kick out a node
that's still active ... or maybe it's a just host name problem.
Regardless, it _does_ work correctly if I simulate a real failure, after
I made the aforementioned cluster.conf change, so I'm confident that
I've got it configured correctly. My gripe is that (a) the gui tool
can't seem to generate even the most simple conf correctly, and (b)
there's apparently a bug in fenced where it passes an "option=on" to the
fence_apc agent, when it clearly should be "option = off". Or else ccsd
is misparsing the cluster.conf file. I don't see how else to explain
that the conf file said "off", then "on", but the daemon did "on", "on".
When updating the cluster.conf file by hand, you are updating the
config_version attribute of the cluster node, right? I do updates to my
cluster.conf file by hand pretty much exclusively, while the cluster is
running, and with no problems whatsoever. Changes propagate as expected
after running "ccs_tool update <cluster.conf filename>"and "cman_tool
version -r <new_version_number>"
Hmmm ... nope, but I will do so in the future. ;-) Thanks.
-g
Greg Forte
gforte@xxxxxxxx
IT - User Services
University of Delaware
302-831-1982
Newark, DE
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster