I would be very appreciative if you could try a test RPM of openais for me to see if it resolves your problem. If your willing please let me know what your architecture is and I'll build you one. Regards -steve On Tue, 2007-07-10 at 18:05 -0400, james anderson wrote: > Steve/Paul, > > I am not sure why, but my emails to the linux-cluster forum have been > getting eaten?! > > In short when node 3 is shutdown the other 2 nodes lose quorum with > each other. This seems wrong. Any ideas? > > *** Steady state cluster happy*** > [root@node1 ~]# cman_tool nodes > Node Sts Inc Joined Name > 1 M 704 2007-07-10 13:47:51 node1 > 2 M 708 2007-07-10 13:52:54 node2 > 3 M 708 2007-07-10 13:52:54 node3 > > *** node 3 shutdown *** > [root@node1 ~]# cman_tool nodes > Node Sts Inc Joined Name > 1 M 704 2007-07-10 13:47:51 node1 > 2 X 708 node2 > 3 X 708 node3 > > *** Time elapsed node 3 still down *** > [root@node1 ~]# cman_tool nodes > NOTE: There are 1 disallowed nodes, > members list may seem inconsistent across the cluster > Node Sts Inc Joined Name > 1 M 704 2007-07-10 13:47:51 node1 > 2 d 708 2007-07-10 13:52:54 node2 > 3 X 708 node3 > > Jul 10 13:52:54 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 13:52:54 node2 openais[3136]: [CLM ] New Configuration: > Jul 10 13:52:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.18) > Jul 10 13:52:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19) > Jul 10 13:52:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.20) > Jul 10 13:52:54 node2 openais[3136]: [CLM ] Members Left: > Jul 10 13:52:54 node2 openais[3136]: [CLM ] Members Joined: > Jul 10 13:52:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.18) > Jul 10 13:52:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.20) > Jul 10 13:52:54 node2 openais[3136]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 13:52:54 node2 openais[3136]: [TOTEM] entering OPERATIONAL > state. > Jul 10 13:52:54 node2 openais[3136]: [CMAN ] quorum regained, resuming > activity > Jul 10 13:52:54 node2 openais[3136]: [CLM ] got nodejoin message > 10.1.1.18 > Jul 10 13:52:54 node2 openais[3136]: [CLM ] got nodejoin message > 10.1.1.19 > Jul 10 13:52:54 node2 openais[3136]: [CLM ] got nodejoin message > 10.1.1.20 > Jul 10 13:52:54 node2 openais[3136]: [CPG ] got joinlist message from > node 1 > Jul 10 13:52:54 node2 openais[3136]: [CPG ] got joinlist message from > node 2 > Jul 10 13:52:54 node2 openais[3136]: [CPG ] got joinlist message from > node 3 > Jul 10 13:59:28 node2 openais[3136]: [TOTEM] The token was lost in the > OPERATIONAL state. > Jul 10 13:59:28 node2 openais[3136]: [TOTEM] Receive multicast socket > recv buffer size (262142 bytes). > Jul 10 13:59:28 node2 openais[3136]: [TOTEM] Transmit multicast socket > send buffer size (262142 bytes). > Jul 10 13:59:28 node2 openais[3136]: [TOTEM] entering GATHER state > from 2. > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] entering GATHER state > from 0. > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] Creating commit token > because I am the rep. > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] Saving state aru 21 high > seq received 21 > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] Storing new sequence id > for ring 2c8 > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] entering COMMIT state. > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] entering RECOVERY state. > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] position [0] member > 10.1.1.19: > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] previous ring seq 708 rep > 10.1.1.18 > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] aru 21 high delivered 21 > received flag 0 > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] Did not need to originate > any messages in recovery. > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] Sending initial ORF token > Jul 10 13:59:32 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 13:59:32 node2 openais[3136]: [CLM ] New Configuration: > Jul 10 13:59:32 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19) > Jul 10 13:59:32 node2 openais[3136]: [CLM ] Members Left: > Jul 10 13:59:32 node2 openais[3136]: [CLM ] no interface found for > nodeid > Jul 10 13:59:32 node2 openais[3136]: [CLM ] no interface found for > nodeid > Jul 10 13:59:32 node2 openais[3136]: [CLM ] Members Joined: > Jul 10 13:59:32 node2 openais[3136]: [CMAN ] quorum lost, blocking > activity > Jul 10 13:59:32 node2 openais[3136]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 13:59:32 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 13:59:32 node2 openais[3136]: [CLM ] New Configuration: > Jul 10 13:59:32 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19) > Jul 10 13:59:32 node2 openais[3136]: [CLM ] Members Left: > Jul 10 13:59:32 node2 openais[3136]: [CLM ] Members Joined: > Jul 10 13:59:32 node2 openais[3136]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 13:59:32 node2 openais[3136]: [TOTEM] entering OPERATIONAL > state. > Jul 10 13:59:32 node2 openais[3136]: [CLM ] got nodejoin message > 10.1.1.19 > Jul 10 13:59:32 node2 openais[3136]: [CPG ] got joinlist message from > node 2 > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] entering GATHER state > from 9. > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] Saving state aru b high > seq received b > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] Storing new sequence id > for ring 2cc > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] entering COMMIT state. > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] entering RECOVERY state. > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] position [0] member > 10.1.1.18: > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] previous ring seq 712 rep > 10.1.1.18 > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] aru b high delivered b > received flag 0 > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] position [1] member > 10.1.1.19: > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] previous ring seq 712 rep > 10.1.1.19 > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] aru b high delivered b > received flag 0 > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] Did not need to originate > any messages in recovery. > Jul 10 14:02:54 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 14:02:54 node2 openais[3136]: [CLM ] New Configuration: > Jul 10 14:02:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19) > Jul 10 14:02:54 node2 openais[3136]: [CLM ] Members Left: > Jul 10 14:02:54 node2 openais[3136]: [CLM ] Members Joined: > Jul 10 14:02:54 node2 openais[3136]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 14:02:54 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 14:02:54 node2 openais[3136]: [CLM ] New Configuration: > Jul 10 14:02:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.18) > Jul 10 14:02:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19) > Jul 10 14:02:54 node2 openais[3136]: [CLM ] Members Left: > Jul 10 14:02:54 node2 openais[3136]: [CLM ] Members Joined: > Jul 10 14:02:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.18) > Jul 10 14:02:54 node2 openais[3136]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 14:02:54 node2 openais[3136]: [TOTEM] entering OPERATIONAL > state. > Jul 10 14:02:54 node2 openais[3136]: [MAIN ] Node node1 not joined to > cman because it has rejoined an inquorate cluster > Jul 10 14:02:54 node2 openais[3136]: [CLM ] got nodejoin message > 10.1.1.18 > Jul 10 14:02:54 node2 openais[3136]: [CLM ] got nodejoin message > 10.1.1.19 > Jul 10 14:02:54 node2 openais[3136]: [CPG ] got joinlist message from > node 1 > Jul 10 14:02:54 node2 openais[3136]: [CPG ] got joinlist message from > node 2 > > *** node 3 back up *** > [root@node1 init.d]# cman_tool nodes > Node Sts Inc Joined Name > 1 M 704 2007-07-10 13:47:51 node1 > 2 X 708 node2 > 3 X 708 node3 > > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] The consensus timeout > expired. > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] entering GATHER state > from 0. > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] entering GATHER state > from 3. > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] Creating commit token > because I am the rep. > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] Saving state aru 16 high > seq received 16 > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] Storing new sequence id > for ring 2d0 > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] entering COMMIT state. > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] entering RECOVERY state. > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] position [0] member > 10.1.1.19: > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] previous ring seq 716 rep > 10.1.1.18 > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] aru 16 high delivered 16 > received flag 0 > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] Did not need to originate > any messages in recovery. > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] Sending initial ORF token > Jul 10 14:13:09 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 14:13:09 node2 openais[3136]: [CLM ] New Configuration: > Jul 10 14:13:09 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19) > Jul 10 14:13:09 node2 openais[3136]: [CLM ] Members Left: > Jul 10 14:13:09 node2 openais[3136]: [CLM ] no interface found for > nodeid > Jul 10 14:13:09 node2 openais[3136]: [CLM ] Members Joined: > Jul 10 14:13:09 node2 openais[3136]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 14:13:09 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 14:13:09 node2 openais[3136]: [CLM ] New Configuration: > Jul 10 14:13:09 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19) > Jul 10 14:13:09 node2 openais[3136]: [CLM ] Members Left: > Jul 10 14:13:09 node2 openais[3136]: [CLM ] Members Joined: > Jul 10 14:13:09 node2 openais[3136]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 14:13:09 node2 openais[3136]: [TOTEM] entering OPERATIONAL > state. > Jul 10 14:13:09 node2 openais[3136]: [CLM ] got nodejoin message > 10.1.1.19 > Jul 10 14:13:09 node2 openais[3136]: [CPG ] got joinlist message from > node 2 > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] entering GATHER state > from 9. > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] entering GATHER state > from 11. > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] Saving state aru b high > seq received b > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] Storing new sequence id > for ring 2d4 > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] entering COMMIT state. > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] entering RECOVERY state. > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] position [0] member > 10.1.1.18: > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] previous ring seq 720 rep > 10.1.1.18 > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] aru b high delivered b > received flag 0 > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] position [1] member > 10.1.1.19: > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] previous ring seq 720 rep > 10.1.1.19 > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] aru b high delivered b > received flag 0 > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] position [2] member > 10.1.1.20: > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] previous ring seq 720 rep > 10.1.1.20 > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] aru b high delivered b > received flag 0 > Jul 10 14:22:54 node2 openais[3136]: [TOTEM] Did not need to originate > any messages in recovery. > Jul 10 14:22:54 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 14:22:54 node2 openais[3136]: [CLM ] New Configuration: > Jul 10 14:22:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19) > Jul 10 14:22:54 node2 openais[3136]: [CLM ] Members Left: > Jul 10 14:22:54 node2 gfs_controld[3164]: groupd_dispatch error -1 > errno 11 > Jul 10 14:22:54 node2 gfs_controld[3164]: groupd connection died > Jul 10 14:22:54 node2 gfs_controld[3164]: cluster is down, exiting > Jul 10 14:22:54 node2 dlm_controld[3158]: groupd is down, exiting > Jul 10 14:23:20 node2 ccsd[3130]: Unable to connect to cluster > infrastructure after 30 seconds. > Jul 10 14:23:50 node2 ccsd[3130]: Unable to connect to cluster > infrastructure after 60 seconds. > > *** node2 cman crashed *** > [root@node1 init.d]# cman_tool nodes > Node Sts Inc Joined Name > 1 M 704 2007-07-10 13:47:51 node1 > 2 X 708 node2 > 3 X 724 node3 > > [root@node2 init.d]# cman_tool nodes > cman_tool: Cannot open connection to cman, is it running ? > > [root@node3 init.d]# cman_tool nodes > Node Sts Inc Joined Name > 1 X 724 node1 > 2 X 724 node2 > 3 M 712 2007-07-10 14:07:13 node3 > > Jul 10 14:42:55 node1 openais[3166]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 14:42:55 node1 openais[3166]: [CLM ] New Configuration: > Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.18) > Jul 10 14:42:55 node1 openais[3166]: [CLM ] Members Left: > Jul 10 14:42:55 node1 openais[3166]: [CLM ] Members Joined: > Jul 10 14:42:55 node1 openais[3166]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 14:42:55 node1 openais[3166]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 14:42:55 node1 openais[3166]: [CLM ] New Configuration: > Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.18) > Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.19) > Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.20) > Jul 10 14:42:55 node1 openais[3166]: [CLM ] Members Left: > Jul 10 14:42:55 node1 openais[3166]: [CLM ] Members Joined: > Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.19) > Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.20) > Jul 10 14:42:55 node1 openais[3166]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 14:42:55 node1 openais[3166]: [TOTEM] entering OPERATIONAL > state. > Jul 10 14:42:55 node1 openais[3166]: [CLM ] got nodejoin message > 10.1.1.18 > Jul 10 14:42:55 node1 openais[3166]: [CLM ] got nodejoin message > 10.1.1.19 > Jul 10 14:42:55 node1 openais[3166]: [CLM ] got nodejoin message > 10.1.1.20 > Jul 10 14:42:55 node1 openais[3166]: [TOTEM] Retransmit List: c > Jul 10 14:42:55 node1 openais[3166]: [TOTEM] Retransmit List: c > Jul 10 14:42:55 node1 openais[3166]: [TOTEM] Retransmit List: c d > Jul 10 14:43:01 node1 last message repeated 47 times > Jul 10 14:43:21 node1 openais[3166]: [TOTEM] The token was lost in the > OPERATIONAL state. > Jul 10 14:43:21 node1 openais[3166]: [TOTEM] Receive multicast socket > recv buffer size (262142 bytes). > Jul 10 14:43:21 node1 openais[3166]: [TOTEM] Transmit multicast socket > send buffer size (262142 bytes). > Jul 10 14:43:21 node1 openais[3166]: [TOTEM] entering GATHER state > from 2. > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] entering GATHER state > from 0. > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Creating commit token > because I am the rep. > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Saving state aru b high > seq received d > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Storing new sequence id > for ring 2ec > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] entering COMMIT state. > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] entering RECOVERY state. > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] position [0] member > 10.1.1.18: > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] previous ring seq 744 rep > 10.1.1.18 > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] aru b high delivered b > received flag 0 > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] copying all old ring > messages from c-d. > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Originated 0 messages in > RECOVERY. > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Originated for recovery: > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Not Originated for > recovery: c d > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Sending initial ORF token > Jul 10 14:43:26 node1 openais[3166]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 14:43:26 node1 openais[3166]: [CLM ] New Configuration: > Jul 10 14:43:26 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.18) > Jul 10 14:43:26 node1 openais[3166]: [CLM ] Members Left: > Jul 10 14:43:26 node1 openais[3166]: [CLM ] no interface found for > nodeid > Jul 10 14:43:26 node1 openais[3166]: [CLM ] no interface found for > nodeid > Jul 10 14:43:26 node1 openais[3166]: [CLM ] Members Joined: > Jul 10 14:43:26 node1 openais[3166]: [CMAN ] quorum lost, blocking > activity > Jul 10 14:43:26 node1 openais[3166]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 14:43:26 node1 openais[3166]: [CLM ] CLM CONFIGURATION CHANGE > Jul 10 14:43:26 node1 openais[3166]: [CLM ] New Configuration: > Jul 10 14:43:26 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.18) > Jul 10 14:43:26 node1 openais[3166]: [CLM ] Members Left: > Jul 10 14:43:26 node1 openais[3166]: [CLM ] Members Joined: > Jul 10 14:43:26 node1 openais[3166]: [SYNC ] This node is within the > primary component and will provide service. > Jul 10 14:43:26 node1 openais[3166]: [TOTEM] entering OPERATIONAL > state. > Jul 10 14:43:26 node1 openais[3166]: [CLM ] got nodejoin message > 10.1.1.18 > Jul 10 14:43:26 node1 openais[3166]: [CPG ] got joinlist message from > node 1 > > > > Let me know what else I can do to narrow this problem down. > > Thank you for the help :) > James > > > > Subject: RE: [Openais] Basic cluster not starting > > From: sdake@xxxxxxxxxx > > To: jamesanderson1@xxxxxxxxxxx > > CC: openais@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-cluster@xxxxxxxxxx > > Date: Mon, 9 Jul 2007 13:11:39 -0700 > > > > Explain crashes whole cluster? Could you send cman_tool nodes after > > fence but before the node restarts? (ie: fence it then unplug its > power > > cord or use the power gui :) > > > > Thanks > > -steve > > > > > > On Mon, 2007-07-09 at 12:47 -0400, james anderson wrote: > > > Steve/Patrick, > > > > > > Thanks for the replies :) > > > > > > I found the following FC6 x86_64 updates and applied them to all 3 > > > nodes: > > > rpm -ivh xen-libs-3.0.3-9.fc6.x86_64.rpm > > > rpm -ivh --nodeps libvirt-0.2.3-1.fc6.x86_64.rpm > > > rpm -ivh bridge-utils-1.1-2.x86_64.rpm > > > rpm -ivh libvirt-python-0.2.3-1.fc6.x86_64.rpm > > > rpm -ivh python-virtinst-0.95.0-1.fc6.noarch.rpm > > > rpm -ivh xen-3.0.3-9.fc6.x86_64.rpm > > > rpm -Uvh cman-2.0.60-1.fc6.x86_64.rpm > > > > > > After installing these I triple checked that the cluster.conf > files > > > are identical. I then rebooted them all and restarted the cman > > > service. The good news is that the basic cluster now works! The > bad > > > news: fencing a node crashes the whole cluster, also conga has > some > > > serious problems. I will post those in seperate emails. > > > > > > Just wanted to tie up this thread for anyone else encountering the > > > same problem. If anyone has had the same experience please post so > my > > > findings can be confirmed. > > > > > > Cheers, > > > James > > > > > > > > > > Subject: Re: [Openais] Basic cluster not starting > > > > From: sdake@xxxxxxxxxx > > > > To: jamesanderson1@xxxxxxxxxxx > > > > CC: openais@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-cluster@xxxxxxxxxx > > > > Date: Sat, 7 Jul 2007 18:06:07 -0700 > > > > > > > > James, > > > > > > > > Let me speak with Patrick Caulfield on this topic Monday. > > > > > > > > I have not seen this before in any of our testing, but it is > > > possible > > > > someone else using RHCS has. I've also copied the linux-cluster > > > list. > > > > > > > > The problem appears to be, however, with something relating to > ccs > > > or > > > > the startup order. The opennais code doesn't know anything about > the > > > > ccsd node ids or parsing of the xml configuration file. That > work is > > > > done by ccsd and cman. > > > > > > > > Did you try the cman init script? > > > > > > > > Regards > > > > -steve > > > > > > > > On Thu, 2007-07-05 at 14:21 -0400, james anderson wrote: > > > > > I am attempting to install GFS on FC6 64bit using RPMs. > > > > > Below you will find my config and steps taken to get a GFS > cluster > > > > > running. > > > > > I am unclear if the problem is with OpenAIS or RHCS. > > > > > > > > > > > > > > > FC6 64bit RPMs > > > > > -------------- > > > > > rpm -ivh openais-0.80.1-3.x86_64.rpm > > > > > rpm -ivh perl-Net-Telnet-3.03-5.noarch.rpm > > > > > rpm -ivh cman-2.0.18-2.fc6.x86_64.rpm > > > > > System config cluster > > > > > rpm -ivh system-config-cluster-1.0.29-1.0.noarch.rpm > > > > > Luci > > > > > rpm -ivh python-imaging-1.1.6-3.fc6.x86_64.rpm > > > > > rpm -ivh zope-2.9.7-2.fc6.x86_64.rpm > > > > > rpm -ivh plone-2.5.3-1.fc6.x86_64.rpm > > > > > rpm -ivh luci-0.9.3-2.fc6.x86_64.rpm > > > > > Ricci > > > > > rpm -ivh --nodeps oddjob-libs-0.27-8.x86_64.rpm > > > > > rpm -ivh oddjob-0.27-8.x86_64.rpm > > > > > rpm -ivh modcluster-0.9.3-2.fc6.x86_64.rpm > > > > > rpm -ivh ricci-0.9.3-2.fc6.x86_64.rpm > > > > > > > > > > > > > > > /etc/cluster/cluster.conf > > > > > ------------------------- > > > > > <?xml version="1.0"?> > > > > > <cluster alias="alpha_cluster" config_version="8" > > > > > name="alpha_cluster"> > > > > > <fence_daemon post_fail_delay="0" post_join_delay="3"/> > > > > > <clusternodes> > > > > > <clusternode name="node1" nodeid="1" votes="1"> > > > > > <multicast addr="239.192.196.121" interface="eth1"/> > > > > > <fence> > > > > > <method name="1"> > > > > > <device name="nps1" port="1" switch="1"/> > > > > > </method> > > > > > </fence> > > > > > </clusternode> > > > > > <clusternode name="node2" nodeid="2" votes="1"> > > > > > <multicast addr="239.192.196.121" interface="eth0"/> > > > > > <fence> > > > > > <method name="1"> > > > > > <device name="nps1" port="2" switch="1"/> > > > > > </method> > > > > > </fence> > > > > > </clusternode> > > > > > <clusternode name="node3" nodeid="3" votes="1"> > > > > > <multicast addr="239.192.196.121" interface="eth2"/> > > > > > <fence> > > > > > <method name="1"> > > > > > <device name="nps1" port="3" switch="1"/> > > > > > </method> > > > > > </fence> > > > > > </clusternode> > > > > > </clusternodes> > > > > > <cman> > > > > > <multicast addr="239.192.196.121"/> > > > > > </cman> > > > > > <fencedevices> > > > > > <fencedevice agent="fence_apc" ipaddr="10.1.1.123" > login="root" > > > > > name="***" passwd="***"/> > > > > > </fencedevices> > > > > > <rm> > > > > > <failoverdomains/> > > > > > <resources/> > > > > > </rm> > > > > > </cluster> > > > > > > > > > > > > > > > Commands > > > > > -------- > > > > > # modprobe lock_dlm > > > > > # modprobe dlm > > > > > # mount -t configfs non /sys/kernel/config > > > > > # ccsd > > > > > # cman_tool join > > > > > > > > > > > > > > > /var/log/messages > > > > > ----------------- > > > > > 1 Jul 2 14:50:16 node1 ccsd[22457]: Starting ccsd 2.0.18: > > > > > 2 Jul 2 14:50:16 node1 ccsd[22457]: Built: Oct 1 2006 17:18:46 > > > > > 3 Jul 2 14:50:16 node1 ccsd[22457]: Copyright (C) Red Hat, > Inc. > > > 2004 > > > > > All rights reserved. > > > > > 4 Jul 2 14:50:45 node1 ccsd[22457]: Unable to connect to > cluster > > > > > infrastructure after 30 seconds. > > > > > 5 Jul 2 14:51:15 node1 ccsd[22457]: Unable to connect to > cluster > > > > > infrastructure after 60 seconds. > > > > > 6 Jul 2 14:51:39 node1 ccsd[22457]: cluster.conf (cluster name > = > > > > > alpha_cluster, version = 6) found. > > > > > 7 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] AIS Executive > > > Service > > > > > RELEASE 'subrev 1204 version 0.80.1' > > > > > 8 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Copyright (C) > > > 2002-2006 > > > > > MontaVista Software, Inc and contributors. > > > > > 9 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Copyright (C) > 2006 > > > Red > > > > > Hat, Inc. > > > > > 10 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] No nodeid > > > specified in > > > > > cluster.conf > > > > > 11 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Error reading > CCS > > > > > info, cannot start > > > > > 12 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] > > > > > 13 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] AIS Executive > > > exiting > > > > > (-9). > > > > > 14 Jul 2 14:51:45 node1 ccsd[22457]: Unable to connect to > cluster > > > > > infrastructure after 90 seconds. > > > > > 15 Jul 2 14:52:15 node1 ccsd[22457]: Unable to connect to > cluster > > > > > infrastructure after 120 seconds. > > > > > 16 Jul 2 14:52:44 node1 ccsd[22457]: Stopping ccsd, SIGTERM > > > received. > > > > > > > > > > Lines 1-6 are from running the "ccsd" command above. > > > > > Lines 7-13 are from running the "cman_tool join" command > above. > > > > > > > > > > I also received the following error message: > > > > > cman not started: CCS does not have a nodeid for this node, > run > > > > > 'ccs_tool addnodeids' to fix > > > > > cman_tool: aisexec daemon didn't start > > > > > > > > > > Yes I did try running the ccs_tool addnodeids. It did not > help. > > > Notice > > > > > in the cluster.conf the nodeids were already in place. Any > > > pointers to > > > > > narrowing down my problem are appreciated. > > > > > > > > > > Thanks, > > > > > James > > > > > > > > > > > > > > > > > > > > > > > > ______________________________________________________________________ > > > > > See what you’re getting into…before you go there. Check it > out! > > > > > _______________________________________________ > > > > > Openais mailing list > > > > > Openais@xxxxxxxxxxxxxxxxxxxxxxxxxx > > > > > https://lists.linux-foundation.org/mailman/listinfo/openais > > > > > > > > > > > > > > > > > ______________________________________________________________________ > > > Missed the show? Watch videos of the Live Earth Concert on MSN. > See > > > them now! > > > > > > ______________________________________________________________________ > PC Magazine’s 2007 editors’ choice for best web mail—award-winning > Windows Live Hotmail. Check it out! -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster