Hi
I have testing my cluster nodes on IBM Blade Center, but i have a
problem, the fence does not work correctly.
I have 2 nodes:
node1 : 10.0.20.34
node2 : 10.0.20.35
When i fence it manually with the command fence_node node1 and
fence_node node2 its woks correctly.
When a service is running on node2, and i disconnect node1 form the
network(to force the fence) it works correctly too.
My problem is when is running a service on node1, and i disconnect node2
from the network, is does not fence the machine.
Here is the logs of the servers, then you can see the fence working, On
node2 you can note that fence return success. But not on node1.
Have you ever experienced this kind of a problem?
Have any suggestions on what i have to do?
I run fence_tool dump on the node that dont fence, and show this message:
1219238621 stop default
1219238621 start default 4 members 1
1219238621 do_recovery stop 1 start 4 finish 1
1219238621 add node 2 to list 1
1219238621 averting fence of node 10.0.20.35
1219238621 finish default 4
1219238681 client 4: dump
Someone know why he show the message "averting fence of node" and dont
fence the node?
Thanks for the help.
node1:
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering GATHER state from 0.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Creating commit token
because I am the rep.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Saving state aru 57 high
seq received 57
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Storing new sequence id for
ring 1edb5c
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering COMMIT state.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering RECOVERY state.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] position [0] member
10.0.20.34:
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] previous ring seq 2022232
rep 10.0.20.34
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] aru 57 high delivered 57
received flag 1
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Did not need to originate
any messages in recovery.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Sending initial ORF token
Aug 18 11:11:46 node1 openais[3515]: [CLM ] CLM CONFIGURATION CHANGE
Aug 18 11:11:46 node1 openais[3515]: [CLM ] New Configuration:
Aug 18 11:11:46 node1 kernel: dlm: closing connection to node 2
Aug 18 11:11:46 node1 fenced: 10.0.20.35 not a cluster member after 0
sec post_fail_delay
Aug 18 11:11:46 node1 openais[3515]: [CLM ] r(0) ip(10.0.20.34)
Aug 18 11:11:46 node1 openais[3515]: [CLM ] Members Left:
Aug 18 11:11:46 node1 openais[3515]: [CLM ] r(0) ip(10.0.20.35)
Aug 18 11:11:46 node1 openais[3515]: [CLM ] Members Joined:
Aug 18 11:11:46 node1 openais[3515]: [CLM ] CLM CONFIGURATION CHANGE
Aug 18 11:11:46 node1 openais[3515]: [CLM ] New Configuration:
Aug 18 11:11:46 node1 openais[3515]: [CLM ] r(0) ip(10.0.20.34)
Aug 18 11:11:46 node1 openais[3515]: [CLM ] Members Left:
Aug 18 11:11:46 node1 openais[3515]: [CLM ] Members Joined:
Aug 18 11:11:46 node1 openais[3515]: [SYNC ] This node is within the
primary component and will provide service.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering OPERATIONAL state.
Aug 18 11:11:46 node1 openais[3515]: [CLM ] got nodejoin message
10.0.20.34
Aug 18 11:11:46 node1 openais[3515]: [CPG ] got joinlist message from
node 1
Node2:
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering GATHER state from 0.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Creating commit token
because I am the rep.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Saving state aru 53 high
seq received 53
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Storing new sequence id for
ring 1edb7c
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering COMMIT state.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering RECOVERY state.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] position [0] member
10.0.20.35:
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] previous ring seq 2022264
rep 10.0.20.34
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] aru 53 high delivered 53
received flag 1
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Did not need to originate
any messages in recovery.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Sending initial ORF token
Aug 18 15:55:52 node2 openais[5232]: [CLM ] CLM CONFIGURATION CHANGE
Aug 18 15:55:52 node2 openais[5232]: [CLM ] New Configuration:
Aug 18 15:55:52 node2 kernel: dlm: closing connection to node 1
Aug 18 15:55:52 node2 fenced[5248]: 10.0.20.34 not a cluster member
after 0 sec post_fail_delay
Aug 18 15:55:52 node2 openais[5232]: [CLM ] r(0) ip(10.0.20.35)
Aug 18 15:55:52 node2 fenced[5248]: fencing node "10.0.20.34"
Aug 18 15:55:53 node2 openais[5232]: [CLM ] Members Left:
Aug 18 15:55:53 node2 openais[5232]: [CLM ] r(0) ip(10.0.20.34)
Aug 18 15:55:53 node2 openais[5232]: [CLM ] Members Joined:
Aug 18 15:55:53 node2 openais[5232]: [CLM ] CLM CONFIGURATION CHANGE
Aug 18 15:55:53 node2 openais[5232]: [CLM ] New Configuration:
Aug 18 15:55:53 node2 openais[5232]: [CLM ] r(0) ip(10.0.20.35)
Aug 18 15:55:53 node2 openais[5232]: [CLM ] Members Left:
Aug 18 15:55:53 node2 openais[5232]: [CLM ] Members Joined:
Aug 18 15:55:53 node2 openais[5232]: [SYNC ] This node is within the
primary component and will provide service.
Aug 18 15:55:53 node2 openais[5232]: [TOTEM] entering OPERATIONAL state.
Aug 18 15:55:53 node2 openais[5232]: [CLM ] got nodejoin message
10.0.20.35
Aug 18 15:55:53 node2 openais[5232]: [CPG ] got joinlist message from
node 2
Aug 18 15:55:59 node2 fenced[5248]: fence "10.0.20.34" success
Aug 18 15:56:00 node2 clurgmgrd[5507]: <notice> Taking over service
service:FirewallClusta from down member 10.0.20.34
--
Bruno F. Deschamps - Consultor
Profissional Certificado LPIC-1
--------------------------------------------------------------------
Redix - Gestão em T.I. com Software Livre
http://www.redix.com.br - redix@xxxxxxxxxxxx
Tel. Coml.: +55 (47) 3323-7313
--------------------------------------------------------------------
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster