On Wed, May 6, 2009 at 7:01 AM, Chrissie Caulfield <ccaulfie@xxxxxxxxxx> wrote: > Miguel Sanchez wrote: >> Hi. I have a CentOS 5.3 cluster with two nodes. If I execute service >> cman restart within a node, or stop + start after few seconds, another >> node doesn´t recognize this membership return and its fellow stay >> forever offline. >> >> For example: >> >> * Before cman restart: >> >> node1# cman_tool status >> Version: 6.1.0 >> Config Version: 6 >> Cluster Name: CSVirtualizacion >> Cluster Id: 42648 >> Cluster Member: Yes >> Cluster Generation: 202600 >> Membership state: Cluster-Member >> Nodes: 2 >> Expected votes: 1 >> Total votes: 2 >> Quorum: 1 >> Active subsystems: 7 >> Flags: 2node Dirty >> Ports Bound: 0 >> Node name: patty >> Node ID: 1 >> Multicast addresses: 224.0.0.133 >> Node addresses: 138.100.8.70 >> >> * After cman stop for node2 (and before a number seconds < token parameter) >> >> node1# cman_tool status >> Version: 6.1.0 >> Config Version: 6 >> Cluster Name: CSVirtualizacion >> Cluster Id: 42648 >> Cluster Member: Yes >> Cluster Generation: 202600 >> Membership state: Cluster-Member >> Nodes: 2 >> Expected votes: 1 >> Total votes: 1 >> Quorum: 1 >> Active subsystems: 7 >> Flags: 2node Dirty >> Ports Bound: 0 >> Node name: patty >> Node ID: 1 >> Multicast addresses: 224.0.0.133 >> Node addresses: 138.100.8.70 >> Wed May 6 12:29:38 CEST 2009 >> >> * After cman stop for node2 (and after a number seconds > token parameter) >> >> node1# date; cman_tool status >> Version: 6.1.0 >> Config Version: 6 >> Cluster Name: CSVirtualizacion >> Cluster Id: 42648 >> Cluster Member: Yes >> Cluster Generation: 202604 >> Membership state: Cluster-Member >> Nodes: 1 >> Expected votes: 1 >> Total votes: 1 >> Quorum: 1 >> Active subsystems: 7 >> Flags: 2node Dirty >> Ports Bound: 0 >> Node name: patty >> Node ID: 1 >> Multicast addresses: 224.0.0.133 >> Node addresses: 138.100.8.70 >> Wed May 6 12:29:47 CEST 2009 >> >> /var/log/messages: >> May 6 12:35:20 node2 openais[17262]: [TOTEM] The token was lost in the >> OPERATIONAL state. >> May 6 12:35:20 node2 openais[17262]: [TOTEM] Receive multicast socket >> recv buffer size (288000 bytes). >> May 6 12:35:20 node2 openais[17262]: [TOTEM] Transmit multicast socket >> send buffer size (262142 bytes). >> May 6 12:35:20 node2 openais[17262]: [TOTEM] entering GATHER state from 2. >> May 6 12:35:25 node2 openais[17262]: [TOTEM] entering GATHER state from 0. >> May 6 12:35:25 node2 openais[17262]: [TOTEM] Creating commit token >> because I am the rep. >> May 6 12:35:25 node2 openais[17262]: [TOTEM] Saving state aru 26 high >> seq received 26 >> May 6 12:35:25 node2 openais[17262]: [TOTEM] Storing new sequence id >> for ring 31780 >> May 6 12:35:25 node2 openais[17262]: [TOTEM] entering COMMIT state. >> May 6 12:35:25 node2 openais[17262]: [TOTEM] entering RECOVERY state. >> May 6 12:35:25 node2 openais[17262]: [TOTEM] position [0] member >> 10.10.8.70: >> May 6 12:35:25 node2 openais[17262]: [TOTEM] previous ring seq 202620 >> rep 10.10.8.70 >> May 6 12:35:25 node2 openais[17262]: [TOTEM] aru 26 high delivered 26 >> received flag 1 >> May 6 12:35:25 node2 openais[17262]: [TOTEM] Did not need to originate >> any messages in recovery. >> May 6 12:35:25 node2 openais[17262]: [TOTEM] Sending initial ORF token >> May 6 12:35:25 node2 openais[17262]: [CLM ] CLM CONFIGURATION CHANGE >> May 6 12:35:25 node2 openais[17262]: [CLM ] New Configuration: >> May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.70) >> May 6 12:35:25 node2 openais[17262]: [CLM ] Members Left: >> May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.71) >> May 6 12:35:25 node2 openais[17262]: [CLM ] Members Joined: >> May 6 12:35:25 node2 openais[17262]: [CLM ] CLM CONFIGURATION CHANGE >> May 6 12:35:25 node2 openais[17262]: [CLM ] New Configuration: >> May 6 12:35:25 node2 openais[17262]: [CLM ] r(0) ip(10.10.8.70) >> May 6 12:35:25 node2 openais[17262]: [CLM ] Members Left: >> May 6 12:35:25 node2 openais[17262]: [CLM ] Members Joined: >> May 6 12:35:25 node2 openais[17262]: [SYNC ] This node is within the >> primary component and will provide service. >> May 6 12:35:25 node2 openais[17262]: [TOTEM] entering OPERATIONAL state. >> May 6 12:35:25 node2 kernel: dlm: closing connection to node 2 >> May 6 12:35:25 node2 openais[17262]: [CLM ] got nodejoin message >> 10.10.8.70 >> May 6 12:35:25 node2 openais[17262]: [CPG ] got joinlist message from >> node 1 >> >> >> if node2 doesn`t wait for run cman start to the detection the >> operational token's lost, node1 detect node2 like offline forever. >> Following attempts for cman restarts don`t change this state: >> node1# cman_tool nodes >> Node Sts Inc Joined Name >> 1 M 202616 2009-05-06 12:34:43 node1 >> 2 X 202628 node2 >> node2# cman_tool nodes >> Node Sts Inc Joined Name >> 1 M 202644 2009-05-06 12:51:04 node1 >> 2 M 202640 2009-05-06 12:51:04 node2 >> >> >> Is it necessary a delay for cman stop + start to avoid this inconsistent >> state or really is it a bug? > > > I suspect it's an instance of this known bug. Check that CentOS has the > appropriate patch available: > > https://bugzilla.redhat.com/show_bug.cgi?id=485026 > > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > When restarting cman, I have always had to stop cman and then manually stop openais before trying to start cman again. If I do not follow these steps then the node would never rejoin the cluster or might fence the other node. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster