Explain crashes whole cluster? Could you send cman_tool nodes after fence but before the node restarts? (ie: fence it then unplug its power cord or use the power gui :) Thanks -steve On Mon, 2007-07-09 at 12:47 -0400, james anderson wrote: > Steve/Patrick, > > Thanks for the replies :) > > I found the following FC6 x86_64 updates and applied them to all 3 > nodes: > rpm -ivh xen-libs-3.0.3-9.fc6.x86_64.rpm > rpm -ivh --nodeps libvirt-0.2.3-1.fc6.x86_64.rpm > rpm -ivh bridge-utils-1.1-2.x86_64.rpm > rpm -ivh libvirt-python-0.2.3-1.fc6.x86_64.rpm > rpm -ivh python-virtinst-0.95.0-1.fc6.noarch.rpm > rpm -ivh xen-3.0.3-9.fc6.x86_64.rpm > rpm -Uvh cman-2.0.60-1.fc6.x86_64.rpm > > After installing these I triple checked that the cluster.conf files > are identical. I then rebooted them all and restarted the cman > service. The good news is that the basic cluster now works! The bad > news: fencing a node crashes the whole cluster, also conga has some > serious problems. I will post those in seperate emails. > > Just wanted to tie up this thread for anyone else encountering the > same problem. If anyone has had the same experience please post so my > findings can be confirmed. > > Cheers, > James > > > > Subject: Re: [Openais] Basic cluster not starting > > From: sdake@xxxxxxxxxx > > To: jamesanderson1@xxxxxxxxxxx > > CC: openais@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-cluster@xxxxxxxxxx > > Date: Sat, 7 Jul 2007 18:06:07 -0700 > > > > James, > > > > Let me speak with Patrick Caulfield on this topic Monday. > > > > I have not seen this before in any of our testing, but it is > possible > > someone else using RHCS has. I've also copied the linux-cluster > list. > > > > The problem appears to be, however, with something relating to ccs > or > > the startup order. The opennais code doesn't know anything about the > > ccsd node ids or parsing of the xml configuration file. That work is > > done by ccsd and cman. > > > > Did you try the cman init script? > > > > Regards > > -steve > > > > On Thu, 2007-07-05 at 14:21 -0400, james anderson wrote: > > > I am attempting to install GFS on FC6 64bit using RPMs. > > > Below you will find my config and steps taken to get a GFS cluster > > > running. > > > I am unclear if the problem is with OpenAIS or RHCS. > > > > > > > > > FC6 64bit RPMs > > > -------------- > > > rpm -ivh openais-0.80.1-3.x86_64.rpm > > > rpm -ivh perl-Net-Telnet-3.03-5.noarch.rpm > > > rpm -ivh cman-2.0.18-2.fc6.x86_64.rpm > > > System config cluster > > > rpm -ivh system-config-cluster-1.0.29-1.0.noarch.rpm > > > Luci > > > rpm -ivh python-imaging-1.1.6-3.fc6.x86_64.rpm > > > rpm -ivh zope-2.9.7-2.fc6.x86_64.rpm > > > rpm -ivh plone-2.5.3-1.fc6.x86_64.rpm > > > rpm -ivh luci-0.9.3-2.fc6.x86_64.rpm > > > Ricci > > > rpm -ivh --nodeps oddjob-libs-0.27-8.x86_64.rpm > > > rpm -ivh oddjob-0.27-8.x86_64.rpm > > > rpm -ivh modcluster-0.9.3-2.fc6.x86_64.rpm > > > rpm -ivh ricci-0.9.3-2.fc6.x86_64.rpm > > > > > > > > > /etc/cluster/cluster.conf > > > ------------------------- > > > <?xml version="1.0"?> > > > <cluster alias="alpha_cluster" config_version="8" > > > name="alpha_cluster"> > > > <fence_daemon post_fail_delay="0" post_join_delay="3"/> > > > <clusternodes> > > > <clusternode name="node1" nodeid="1" votes="1"> > > > <multicast addr="239.192.196.121" interface="eth1"/> > > > <fence> > > > <method name="1"> > > > <device name="nps1" port="1" switch="1"/> > > > </method> > > > </fence> > > > </clusternode> > > > <clusternode name="node2" nodeid="2" votes="1"> > > > <multicast addr="239.192.196.121" interface="eth0"/> > > > <fence> > > > <method name="1"> > > > <device name="nps1" port="2" switch="1"/> > > > </method> > > > </fence> > > > </clusternode> > > > <clusternode name="node3" nodeid="3" votes="1"> > > > <multicast addr="239.192.196.121" interface="eth2"/> > > > <fence> > > > <method name="1"> > > > <device name="nps1" port="3" switch="1"/> > > > </method> > > > </fence> > > > </clusternode> > > > </clusternodes> > > > <cman> > > > <multicast addr="239.192.196.121"/> > > > </cman> > > > <fencedevices> > > > <fencedevice agent="fence_apc" ipaddr="10.1.1.123" login="root" > > > name="***" passwd="***"/> > > > </fencedevices> > > > <rm> > > > <failoverdomains/> > > > <resources/> > > > </rm> > > > </cluster> > > > > > > > > > Commands > > > -------- > > > # modprobe lock_dlm > > > # modprobe dlm > > > # mount -t configfs non /sys/kernel/config > > > # ccsd > > > # cman_tool join > > > > > > > > > /var/log/messages > > > ----------------- > > > 1 Jul 2 14:50:16 node1 ccsd[22457]: Starting ccsd 2.0.18: > > > 2 Jul 2 14:50:16 node1 ccsd[22457]: Built: Oct 1 2006 17:18:46 > > > 3 Jul 2 14:50:16 node1 ccsd[22457]: Copyright (C) Red Hat, Inc. > 2004 > > > All rights reserved. > > > 4 Jul 2 14:50:45 node1 ccsd[22457]: Unable to connect to cluster > > > infrastructure after 30 seconds. > > > 5 Jul 2 14:51:15 node1 ccsd[22457]: Unable to connect to cluster > > > infrastructure after 60 seconds. > > > 6 Jul 2 14:51:39 node1 ccsd[22457]: cluster.conf (cluster name = > > > alpha_cluster, version = 6) found. > > > 7 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] AIS Executive > Service > > > RELEASE 'subrev 1204 version 0.80.1' > > > 8 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Copyright (C) > 2002-2006 > > > MontaVista Software, Inc and contributors. > > > 9 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Copyright (C) 2006 > Red > > > Hat, Inc. > > > 10 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] No nodeid > specified in > > > cluster.conf > > > 11 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Error reading CCS > > > info, cannot start > > > 12 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] > > > 13 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] AIS Executive > exiting > > > (-9). > > > 14 Jul 2 14:51:45 node1 ccsd[22457]: Unable to connect to cluster > > > infrastructure after 90 seconds. > > > 15 Jul 2 14:52:15 node1 ccsd[22457]: Unable to connect to cluster > > > infrastructure after 120 seconds. > > > 16 Jul 2 14:52:44 node1 ccsd[22457]: Stopping ccsd, SIGTERM > received. > > > > > > Lines 1-6 are from running the "ccsd" command above. > > > Lines 7-13 are from running the "cman_tool join" command above. > > > > > > I also received the following error message: > > > cman not started: CCS does not have a nodeid for this node, run > > > 'ccs_tool addnodeids' to fix > > > cman_tool: aisexec daemon didn't start > > > > > > Yes I did try running the ccs_tool addnodeids. It did not help. > Notice > > > in the cluster.conf the nodeids were already in place. Any > pointers to > > > narrowing down my problem are appreciated. > > > > > > Thanks, > > > James > > > > > > > > > > > > > ______________________________________________________________________ > > > See what you’re getting into…before you go there. Check it out! > > > _______________________________________________ > > > Openais mailing list > > > Openais@xxxxxxxxxxxxxxxxxxxxxxxxxx > > > https://lists.linux-foundation.org/mailman/listinfo/openais > > > > > > ______________________________________________________________________ > Missed the show? Watch videos of the Live Earth Concert on MSN. See > them now! -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster