Hi,
I'm currently configuring fencing devices for my 2 nodes on a RHEL4
cluster. The problem is quite long, so please bear with me.
I have 2 nodes (let's call them stone1 and stone2) and 2 APC fencing
devices (pdu1 and pdu2, both apc 7952 devices). Both stone1 and stone2
has dual power supplies. Stone1's power supplies are connected to outlet
13 of pdu1 and pdu2. Stone2's power supplies are connected to outlet 20
of both the pdus. My question is: during the fencing configuration for
each node, i need to specify which fence device to add to the fence
level of each node. Is it correct to specify for stone1 as follows :
pdu1 -> port=13, switch=1, pdu2-> port=13, switch=2? The same applies to
stone 2 : pdu1-> port=20, switch=1, pdu2-> port=20, switch=2?
After configuring as mentioned above, with both nodes on the cluster
running and my application running on stone1, i pull out the ethernet
cables for stone1 to simulate that the server is down. By right, my
application should fail over to stone2 and fencing should occur to
stone1 (ie, stone1 should be rebooted/shutdown). However, what happened
is that my application is started on stone2, and stone1 is not fenced.
In fact, when i reconnect by cables, my application is still running on
stone1! Seems that there are 2 instances of my application running, each
on stone1 and stone2.
Why has the fencing failed? I've read somewhere that acpid service plays
a part and i need to disable it. Is it true? When I check my
/var/log/messages, I see a cman :sendmsg failed -101 error. What does
this mean?
I've been trying to solve this problem for the last few days, but to no
avail. Any advice will be appreciated.
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster