Hi. I have a CentOS 5.3 cluster with two nodes. If I execute service
cman restart within a node, or stop + start after few seconds, another
node doesn´t recognize this membership return and its fellow stay
forever offline.
For example:
* Before cman restart:
node1# cman_tool status
Version: 6.1.0
Config Version: 6
Cluster Name: CSVirtualizacion
Cluster Id: 42648
Cluster Member: Yes
Cluster Generation: 202600
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1
Active subsystems: 7
Flags: 2node Dirty
Ports Bound: 0
Node name: patty
Node ID: 1
Multicast addresses: 224.0.0.133
Node addresses: 138.100.8.70
* After cman stop for node2 (and before a number seconds < token parameter)
node1# cman_tool status
Version: 6.1.0
Config Version: 6
Cluster Name: CSVirtualizacion
Cluster Id: 42648
Cluster Member: Yes
Cluster Generation: 202600
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node Dirty
Ports Bound: 0
Node name: patty
Node ID: 1
Multicast addresses: 224.0.0.133
Node addresses: 138.100.8.70
Wed May 6 12:29:38 CEST 2009
* After cman stop for node2 (and after a number seconds > token parameter)
node1# date; cman_tool status
Version: 6.1.0
Config Version: 6
Cluster Name: CSVirtualizacion
Cluster Id: 42648
Cluster Member: Yes
Cluster Generation: 202604
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node Dirty
Ports Bound: 0
Node name: patty
Node ID: 1
Multicast addresses: 224.0.0.133
Node addresses: 138.100.8.70
Wed May 6 12:29:47 CEST 2009
/var/log/messages:
May 6 12:35:20 node2 openais[17262]: [TOTEM] The token was lost in the
OPERATIONAL state.
May 6 12:35:20 node2 openais[17262]: [TOTEM] Receive multicast socket
recv buffer size (288000 bytes).
May 6 12:35:20 node2 openais[17262]: [TOTEM] Transmit multicast socket
send buffer size (262142 bytes).
May 6 12:35:20 node2 openais[17262]: [TOTEM] entering GATHER state from 2.
May 6 12:35:25 node2 openais[17262]: [TOTEM] entering GATHER state from 0.
May 6 12:35:25 node2 openais[17262]: [TOTEM] Creating commit token
because I am the rep.
May 6 12:35:25 node2 openais[17262]: [TOTEM] Saving state aru 26 high
seq received 26
May 6 12:35:25 node2 openais[17262]: [TOTEM] Storing new sequence id
for ring 31780
May 6 12:35:25 node2 openais[17262]: [TOTEM] entering COMMIT state.
May 6 12:35:25 node2 openais[17262]: [TOTEM] entering RECOVERY state.
May 6 12:35:25 node2 openais[17262]: [TOTEM] position [0] member
10.10.8.70:
May 6 12:35:25 node2 openais[17262]: [TOTEM] previous ring seq 202620
rep 10.10.8.70
May 6 12:35:25 node2 openais[17262]: [TOTEM] aru 26 high delivered 26
received flag 1
May 6 12:35:25 node2 openais[17262]: [TOTEM] Did not need to originate
any messages in recovery.
May 6 12:35:25 node2 openais[17262]: [TOTEM] Sending initial ORF token
May 6 12:35:25 node2 openais[17262]: [CLM ] CLM CONFIGURATION CHANGE
May 6 12:35:25 node2 openais[17262]: [CLM ] New Configuration:
May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.70)
May 6 12:35:25 node2 openais[17262]: [CLM ] Members Left:
May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.71)
May 6 12:35:25 node2 openais[17262]: [CLM ] Members Joined:
May 6 12:35:25 node2 openais[17262]: [CLM ] CLM CONFIGURATION CHANGE
May 6 12:35:25 node2 openais[17262]: [CLM ] New Configuration:
May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.70)
May 6 12:35:25 node2 openais[17262]: [CLM ] Members Left:
May 6 12:35:25 node2 openais[17262]: [CLM ] Members Joined:
May 6 12:35:25 node2 openais[17262]: [SYNC ] This node is within the
primary component and will provide service.
May 6 12:35:25 node2 openais[17262]: [TOTEM] entering OPERATIONAL state.
May 6 12:35:25 node2 kernel: dlm: closing connection to node 2
May 6 12:35:25 node2 openais[17262]: [CLM ] got nodejoin message
10.10.8.70
May 6 12:35:25 node2 openais[17262]: [CPG ] got joinlist message from
node 1
if node2 doesn`t wait for run cman start to the detection the
operational token's lost, node1 detect node2 like offline forever.
Following attempts for cman restarts don`t change this state:
node1# cman_tool nodes
Node Sts Inc Joined Name
1 M 202616 2009-05-06 12:34:43 node1
2 X 202628 node2
node2# cman_tool nodes
Node Sts Inc Joined Name
1 M 202644 2009-05-06 12:51:04 node1
2 M 202640 2009-05-06 12:51:04 node2
Is it necessary a delay for cman stop + start to avoid this inconsistent
state or really is it a bug?
Regards.
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster