Miguel Sanchez wrote: > Hi. I have a CentOS 5.3 cluster with two nodes. If I execute service > cman restart within a node, or stop + start after few seconds, another > node doesn´t recognize this membership return and its fellow stay > forever offline. > > For example: > > * Before cman restart: > > node1# cman_tool status > Version: 6.1.0 > Config Version: 6 > Cluster Name: CSVirtualizacion > Cluster Id: 42648 > Cluster Member: Yes > Cluster Generation: 202600 > Membership state: Cluster-Member > Nodes: 2 > Expected votes: 1 > Total votes: 2 > Quorum: 1 > Active subsystems: 7 > Flags: 2node Dirty > Ports Bound: 0 > Node name: patty > Node ID: 1 > Multicast addresses: 224.0.0.133 > Node addresses: 138.100.8.70 > > * After cman stop for node2 (and before a number seconds < token parameter) > > node1# cman_tool status > Version: 6.1.0 > Config Version: 6 > Cluster Name: CSVirtualizacion > Cluster Id: 42648 > Cluster Member: Yes > Cluster Generation: 202600 > Membership state: Cluster-Member > Nodes: 2 > Expected votes: 1 > Total votes: 1 > Quorum: 1 > Active subsystems: 7 > Flags: 2node Dirty > Ports Bound: 0 > Node name: patty > Node ID: 1 > Multicast addresses: 224.0.0.133 > Node addresses: 138.100.8.70 > Wed May 6 12:29:38 CEST 2009 > > * After cman stop for node2 (and after a number seconds > token parameter) > > node1# date; cman_tool status > Version: 6.1.0 > Config Version: 6 > Cluster Name: CSVirtualizacion > Cluster Id: 42648 > Cluster Member: Yes > Cluster Generation: 202604 > Membership state: Cluster-Member > Nodes: 1 > Expected votes: 1 > Total votes: 1 > Quorum: 1 > Active subsystems: 7 > Flags: 2node Dirty > Ports Bound: 0 > Node name: patty > Node ID: 1 > Multicast addresses: 224.0.0.133 > Node addresses: 138.100.8.70 > Wed May 6 12:29:47 CEST 2009 > > /var/log/messages: > May 6 12:35:20 node2 openais[17262]: [TOTEM] The token was lost in the > OPERATIONAL state. > May 6 12:35:20 node2 openais[17262]: [TOTEM] Receive multicast socket > recv buffer size (288000 bytes). > May 6 12:35:20 node2 openais[17262]: [TOTEM] Transmit multicast socket > send buffer size (262142 bytes). > May 6 12:35:20 node2 openais[17262]: [TOTEM] entering GATHER state from 2. > May 6 12:35:25 node2 openais[17262]: [TOTEM] entering GATHER state from 0. > May 6 12:35:25 node2 openais[17262]: [TOTEM] Creating commit token > because I am the rep. > May 6 12:35:25 node2 openais[17262]: [TOTEM] Saving state aru 26 high > seq received 26 > May 6 12:35:25 node2 openais[17262]: [TOTEM] Storing new sequence id > for ring 31780 > May 6 12:35:25 node2 openais[17262]: [TOTEM] entering COMMIT state. > May 6 12:35:25 node2 openais[17262]: [TOTEM] entering RECOVERY state. > May 6 12:35:25 node2 openais[17262]: [TOTEM] position [0] member > 10.10.8.70: > May 6 12:35:25 node2 openais[17262]: [TOTEM] previous ring seq 202620 > rep 10.10.8.70 > May 6 12:35:25 node2 openais[17262]: [TOTEM] aru 26 high delivered 26 > received flag 1 > May 6 12:35:25 node2 openais[17262]: [TOTEM] Did not need to originate > any messages in recovery. > May 6 12:35:25 node2 openais[17262]: [TOTEM] Sending initial ORF token > May 6 12:35:25 node2 openais[17262]: [CLM ] CLM CONFIGURATION CHANGE > May 6 12:35:25 node2 openais[17262]: [CLM ] New Configuration: > May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.70) > May 6 12:35:25 node2 openais[17262]: [CLM ] Members Left: > May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.71) > May 6 12:35:25 node2 openais[17262]: [CLM ] Members Joined: > May 6 12:35:25 node2 openais[17262]: [CLM ] CLM CONFIGURATION CHANGE > May 6 12:35:25 node2 openais[17262]: [CLM ] New Configuration: > May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.70) > May 6 12:35:25 node2 openais[17262]: [CLM ] Members Left: > May 6 12:35:25 node2 openais[17262]: [CLM ] Members Joined: > May 6 12:35:25 node2 openais[17262]: [SYNC ] This node is within the > primary component and will provide service. > May 6 12:35:25 node2 openais[17262]: [TOTEM] entering OPERATIONAL state. > May 6 12:35:25 node2 kernel: dlm: closing connection to node 2 > May 6 12:35:25 node2 openais[17262]: [CLM ] got nodejoin message > 10.10.8.70 > May 6 12:35:25 node2 openais[17262]: [CPG ] got joinlist message from > node 1 > > > if node2 doesn`t wait for run cman start to the detection the > operational token's lost, node1 detect node2 like offline forever. > Following attempts for cman restarts don`t change this state: > node1# cman_tool nodes > Node Sts Inc Joined Name > 1 M 202616 2009-05-06 12:34:43 node1 > 2 X 202628 node2 > node2# cman_tool nodes > Node Sts Inc Joined Name > 1 M 202644 2009-05-06 12:51:04 node1 > 2 M 202640 2009-05-06 12:51:04 node2 > > > Is it necessary a delay for cman stop + start to avoid this inconsistent > state or really is it a bug? I suspect it's an instance of this known bug. Check that CentOS has the appropriate patch available: https://bugzilla.redhat.com/show_bug.cgi?id=485026 Chrissie -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster