Cluster Fence not working on all nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I have testing my cluster nodes on IBM Blade Center, but i have a problem, the fence does not work correctly.
I have 2 nodes:
node1 : 10.0.20.34
node2 : 10.0.20.35

When i fence it manually with the command fence_node node1 and fence_node node2 its woks correctly.
When a service is running on node2, and i disconnect node1 form the network(to force the fence) it works correctly too.
My problem is when is running a service on node1, and i disconnect node2 from the network, is does not fence the machine.

Here is the logs of the servers, then you can see the fence working, On node2 you can note that fence return success. But not on node1.

Have you ever experienced this kind of a problem?
Have any suggestions on what i have to do?


I run fence_tool dump on the node that dont fence, and show this message:

1219238621 stop default
1219238621 start default 4 members 1
1219238621 do_recovery stop 1 start 4 finish 1
1219238621 add node 2 to list 1
1219238621 averting fence of node 10.0.20.35
1219238621 finish default 4
1219238681 client 4: dump


Someone  know why he show the message "averting fence of node" and dont fence the node?

Thanks for the help.



node1:
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering GATHER state from 0.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Creating commit token because I am the rep.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Saving state aru 57 high seq received 57
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Storing new sequence id for ring 1edb5c
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering COMMIT state.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering RECOVERY state.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] position [0] member 10.0.20.34:
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] previous ring seq 2022232 rep 10.0.20.34
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] aru 57 high delivered 57 received flag 1
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Did not need to originate any messages in recovery.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Sending initial ORF token
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] New Configuration:
Aug 18 11:11:46 node1 kernel: dlm: closing connection to node 2
Aug 18 11:11:46 node1 fenced: 10.0.20.35 not a cluster member after 0 sec post_fail_delay
Aug 18 11:11:46 node1 openais[3515]: [CLM  ]    r(0) ip(10.0.20.34
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Left:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ]    r(0) ip(10.0.20.35
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Joined:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] New Configuration:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ]    r(0) ip(10.0.20.34
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Left:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Joined:
Aug 18 11:11:46 node1 openais[3515]: [SYNC ] This node is within the primary component and will provide service.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering OPERATIONAL state.
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] got nodejoin message 10.0.20.34
Aug 18 11:11:46 node1 openais[3515]: [CPG  ] got joinlist message from node 1



Node2:
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering GATHER state from 0.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Creating commit token because I am the rep.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Saving state aru 53 high seq received 53
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Storing new sequence id for ring 1edb7c
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering COMMIT state.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering RECOVERY state.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] position [0] member 10.0.20.35:
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] previous ring seq 2022264 rep 10.0.20.34
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] aru 53 high delivered 53 received flag 1
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Did not need to originate any messages in recovery.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Sending initial ORF token
Aug 18 15:55:52 node2 openais[5232]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 15:55:52 node2 openais[5232]: [CLM  ] New Configuration:
Aug 18 15:55:52 node2 kernel: dlm: closing connection to node 1
Aug 18 15:55:52 node2 fenced[5248]: 10.0.20.34 not a cluster member after 0 sec post_fail_delay
Aug 18 15:55:52 node2 openais[5232]: [CLM  ]    r(0) ip(10.0.20.35
Aug 18 15:55:52 node2 fenced[5248]: fencing node "10.0.20.34"
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Left:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ]    r(0) ip(10.0.20.34
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Joined:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] New Configuration:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ]    r(0) ip(10.0.20.35
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Left:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Joined:
Aug 18 15:55:53 node2 openais[5232]: [SYNC ] This node is within the primary component and will provide service.
Aug 18 15:55:53 node2 openais[5232]: [TOTEM] entering OPERATIONAL state.
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] got nodejoin message 10.0.20.35
Aug 18 15:55:53 node2 openais[5232]: [CPG  ] got joinlist message from node 2
Aug 18 15:55:59 node2 fenced[5248]: fence "10.0.20.34" success
Aug 18 15:56:00 node2 clurgmgrd[5507]: <notice> Taking over service service:FirewallClusta from down member 10.0.20.34
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux