Sorry but security here will not allow me to send host files BUT. I was getting this in /var/log/messages on csarcsys3 Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate. Refusing connection. Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing connect: Connection refused Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs error -111, check ccsd or cluster status Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate. Refusing connection. Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing connect: Connection refused I had /dev/vg0/gfsvol on these systems. I did a lvremove Restarted cman on all systems and for some strange reason my clusters are working. It doesn't make any sense. I can't thank you enough for your help.......!!!!!! Thanks. -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas Sent: Tuesday, March 25, 2008 10:27 AM To: linux clustering Subject: Re: 3 node cluster problems I am currently running several 3-node cluster without a quorum disk. However, If you want your cluster to run if only one node is up then you will need a quorum disk. Can you send your /etc/hosts file for all systems, Also, could there be another node name called csarcsys3-eth0 in your NIS or DNS I configured some using Conga and some with system-config-cluster. When using the system-config-cluster I basically run the config on all nodes; just adding the nodenames and cluster name. I reboot all nodes to make sure they see each other then go back and modify the config files. The file /var/log/messages should also shed some light on the problem. Dalton, Maurice wrote: > > Same problem. > > I now have qdiskd running. > > I have ran diff's on all three cluster.conf files.. all are the same > > [root@csarcsys1-eth0 cluster]# more cluster.conf > > <?xml version="1.0"?> > > <cluster config_version="6" name="csarcsys5"> > > <fence_daemon post_fail_delay="0" post_join_delay="3"/> > > <clusternodes> > > <clusternode name="csarcsys1-eth0" nodeid="1" votes="1"> > > <fence/> > > </clusternode> > > <clusternode name="csarcsys2-eth0" nodeid="2" votes="1"> > > <fence/> > > </clusternode> > > <clusternode name="csarcsys3-eth0" nodeid="3" votes="1"> > > <fence/> > > </clusternode> > > </clusternodes> > > <cman/> > > <fencedevices/> > > <rm> > > <failoverdomains> > > <failoverdomain name="csarcsysfo" ordered="0" restricted="1"> > > <failoverdomainnode name="csarcsys1-eth0" priority="1"/> > > <failoverdomainnode name="csarcsys2-eth0" priority="1"/> > > <failoverdomainnode name="csarcsys3-eth0" priority="1"/> > > </failoverdomain> > > </failoverdomains> > > <resources> > > <ip address="172.24.86.177" monitor_link="1"/> > > <fs device="/dev/sdc1" force_fsck="0" force_unmount="1" fsid="57739" > fstype="ext3" mountpo > > int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/> > > </resources> > > </rm> > > <quorumd interval="4" label="csarcsysQ" min_score="1" tko="30" votes="2"/> > > </cluster> > > More info from csarcsys3 > > [root@csarcsys3-eth0 cluster]# clustat > > msg_open: No such file or directory > > Member Status: Inquorate > > Member Name ID Status > > ------ ---- ---- ------ > > csarcsys1-eth0 1 Offline > > csarcsys2-eth0 2 Offline > > csarcsys3-eth0 3 Online, Local > > /dev/sdd1 0 Offline > > [root@csarcsys3-eth0 cluster]# mkqdisk -L > > mkqdisk v0.5.1 > > /dev/sdd1: > > Magic: eb7a62c2 > > Label: csarcsysQ > > Created: Wed Feb 13 13:44:35 2008 > > Host: csarcsys1-eth0.xxx.xxx.nasa.gov > > [root@csarcsys3-eth0 cluster]# ls -l /dev/sdd1 > > brw-r----- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1 > > clustat from csarcsys1 > > msg_open: No such file or directory > > Member Status: Quorate > > Member Name ID Status > > ------ ---- ---- ------ > > csarcsys1-eth0 1 Online, Local > > csarcsys2-eth0 2 Online > > csarcsys3-eth0 3 Offline > > /dev/sdd1 0 Offline, Quorum Disk > > [root@csarcsys1-eth0 cluster]# ls -l /dev/sdd1 > > brw-r----- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1 > > mkqdisk v0.5.1 > > /dev/sdd1: > > Magic: eb7a62c2 > > Label: csarcsysQ > > Created: Wed Feb 13 13:44:35 2008 > > Host: csarcsys1-eth0.xxx.xxx.nasa.gov > > Info from csarcsys2 > > root@csarcsys2-eth0 cluster]# clustat > > msg_open: No such file or directory > > Member Status: Quorate > > Member Name ID Status > > ------ ---- ---- ------ > > csarcsys1-eth0 1 Offline > > csarcsys2-eth0 2 Online, Local > > csarcsys3-eth0 3 Offline > > /dev/sdd1 0 Online, Quorum Disk > > *From:* linux-cluster-bounces@xxxxxxxxxx > [mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Panigrahi, > Santosh Kumar > *Sent:* Tuesday, March 25, 2008 7:33 AM > *To:* linux clustering > *Subject:* RE: 3 node cluster problems > > If you are configuring your cluster by system-config-cluster then no > need to run ricci/luci. Ricci/luci needed for configuring the cluster > using conga. You can configure in either ways. > > On seeing your clustat command outputs, it seems cluster is > partitioned (spilt brain) into 2 sub clusters [Sub1-* > **(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without a > quorum device you can more often face this situation. To avoid this > you can configure a quorum device with a heuristic like ping message. > Use the link > (http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with- qdisk/) > for configuring a quorum disk in RHCS. > > Thanks, > > S > > -----Original Message----- > From: linux-cluster-bounces@xxxxxxxxxx > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Dalton, Maurice > Sent: Tuesday, March 25, 2008 5:18 PM > To: linux clustering > Subject: RE: 3 node cluster problems > > Still no change. Same as below. > > I completely rebuilt the cluster using system-config-cluster > > The Cluster software was installed from rhn, luci and ricci are running. > > This is the new config file and it has been copied to the 2 other > > systems > > [root@csarcsys1-eth0 cluster]# more cluster.conf > > <?xml version="1.0"?> > > <cluster config_version="5" name="csarcsys5"> > > <fence_daemon post_fail_delay="0" post_join_delay="3"/> > > <clusternodes> > > <clusternode name="csarcsys1-eth0" nodeid="1" votes="1"> > > <fence/> > > </clusternode> > > <clusternode name="csarcsys2-eth0" nodeid="2" votes="1"> > > <fence/> > > </clusternode> > > <clusternode name="csarcsys3-eth0" nodeid="3" votes="1"> > > <fence/> > > </clusternode> > > </clusternodes> > > <cman/> > > <fencedevices/> > > <rm> > > <failoverdomains> > > <failoverdomain name="csarcsysfo" ordered="0" > > restricted="1"> > > <failoverdomainnode > > name="csarcsys1-eth0" priority="1"/> > > <failoverdomainnode > > name="csarcsys2-eth0" priority="1"/> > > <failoverdomainnode > > name="csarcsys3-eth0" priority="1"/> > > </failoverdomain> > > </failoverdomains> > > <resources> > > <ip address="172.xx.xx.xxx" monitor_link="1"/> > > <fs device="/dev/sdc1" force_fsck="0" > > force_unmount="1" fsid="57739" fstype="ext3" mountpo > > int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/> > > </resources> > > </rm> > > </cluster> > > -----Original Message----- > > From: linux-cluster-bounces@xxxxxxxxxx > > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas > > Sent: Monday, March 24, 2008 4:17 PM > > To: linux clustering > > Subject: Re: 3 node cluster problems > > Did you load the Cluster software via Conga or manually ? You would have > > had to load > > luci on one node and ricci on all three. > > Try copying the modified /etc/cluster/cluster.conf from csarcsys1 to the > > other two nodes. > > Make sure you can ping the private interface to/from all nodes and > > reboot. If this does not work > > post your /etc/cluster/cluster.conf file again. > > Dalton, Maurice wrote: > > > Yes > > > I also rebooted again just now to be sure. > > > > > > > > > -----Original Message----- > > > From: linux-cluster-bounces@xxxxxxxxxx > > > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas > > > Sent: Monday, March 24, 2008 3:33 PM > > > To: linux clustering > > > Subject: Re: 3 node cluster problems > > > > > > When you changed the nodenames in the /etc/lcuster/cluster.conf and > > made > > > > > > sure the /etc/hosts > > > file had the correct nodenames (Ie. 10.0.0.100 csarcsys1-eth0 > > > csarcsys1-eth0.xxxx.xxxx.xxx.) > > > Did you reboot all the nodes at the sametime ? > > > > > > Dalton, Maurice wrote: > > > > > >> No luck. It seems as if csarcsys3 thinks its in his own cluster > > >> I renamed all config files and rebuilt from system-config-cluster > > >> > > >> Clustat command from csarcsys3 > > >> > > >> > > >> [root@csarcsys3-eth0 cluster]# clustat > > >> msg_open: No such file or directory > > >> Member Status: Inquorate > > >> > > >> Member Name ID Status > > >> ------ ---- ---- ------ > > >> csarcsys1-eth0 1 Offline > > >> csarcsys2-eth0 2 Offline > > >> csarcsys3-eth0 3 Online, Local > > >> > > >> clustat command from csarcsys2 > > >> > > >> [root@csarcsys2-eth0 cluster]# clustat > > >> msg_open: No such file or directory > > >> Member Status: Quorate > > >> > > >> Member Name ID Status > > >> ------ ---- ---- ------ > > >> csarcsys1-eth0 1 Online > > >> csarcsys2-eth0 2 Online, Local > > >> csarcsys3-eth0 3 Offline > > >> > > >> > > >> -----Original Message----- > > >> From: linux-cluster-bounces@xxxxxxxxxx > > >> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas > > >> Sent: Monday, March 24, 2008 2:25 PM > > >> To: linux clustering > > >> Subject: Re: 3 node cluster problems > > >> > > >> You will also, need to make sure the clustered nodenames are in your > > >> /etc/hosts file. > > >> Also, make sure your cluster network interface is up on all nodes and > > >> that the > > >> /etc/cluster/cluster.conf are the same on all nodes. > > >> > > >> > > >> > > >> Dalton, Maurice wrote: > > >> > > >> > > >>> The last post is incorrect. > > >>> > > >>> Fence is still hanging at start up. > > >>> > > >>> Here's another log message. > > >>> > > >>> Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing > > >>> connect: Connection refused > > >>> > > >>> Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs > > >>> error -111, check ccsd or cluster status > > >>> > > >>> *From:* linux-cluster-bounces@xxxxxxxxxx > > >>> [mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Bennie > > >>> > > > Thomas > > > > > >>> *Sent:* Monday, March 24, 2008 11:22 AM > > >>> *To:* linux clustering > > >>> *Subject:* Re: 3 node cluster problems > > >>> > > >>> try removing the fully qualified hostname from the cluster.conf > > file. > > >>> > > >>> > > >>> Dalton, Maurice wrote: > > >>> > > >>> I have NO fencing equipment > > >>> > > >>> I have been task to setup a 3 node cluster > > >>> > > >>> Currently I have having problems getting cman(fence) to start > > >>> > > >>> Fence will try to start up during cman start up but will fail > > >>> > > >>> I tried to run /sbin/fenced -D - I get the following > > >>> > > >>> 1206373475 cman_init error 0 111 > > >>> > > >>> Here's my cluster.conf file > > >>> > > >>> <?xml version="1.0"?> > > >>> > > >>> <cluster alias="csarcsys51" config_version="26" name="csarcsys51"> > > >>> > > >>> <fence_daemon clean_start="0" post_fail_delay="0" > > >>> > > >>> > > >> post_join_delay="3"/> > > >> > > >> > > >>> <clusternodes> > > >>> > > >>> <clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1" > > >>> > > >>> > > >> votes="1"> > > >> > > >> > > >>> <fence/> > > >>> > > >>> </clusternode> > > >>> > > >>> <clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2" > > >>> > > >>> > > >> votes="1"> > > >> > > >> > > >>> <fence/> > > >>> > > >>> </clusternode> > > >>> > > >>> <clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3" > > >>> > > >>> > > >> votes="1"> > > >> > > >> > > >>> <fence/> > > >>> > > >>> </clusternode> > > >>> > > >>> </clusternodes> > > >>> > > >>> <cman/> > > >>> > > >>> <fencedevices/> > > >>> > > >>> <rm> > > >>> > > >>> <failoverdomains> > > >>> > > >>> <failoverdomain name="csarcsys-fo" ordered="1" restricted="0"> > > >>> > > >>> <failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" > > >>> > > >>> > > >> priority="1"/> > > >> > > >> > > >>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" > > >>> > > >>> > > >> priority="1"/> > > >> > > >> > > >>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" > > >>> > > >>> > > >> priority="1"/> > > >> > > >> > > >>> </failoverdomain> > > >>> > > >>> </failoverdomains> > > >>> > > >>> <resources> > > >>> > > >>> <ip address="xxx.xxx.xxx.xxx" monitor_link="1"/> > > >>> > > >>> <fs device="/dev/sdc1" force_fsck="0" force_unmount="1" fsid="57739" > > >>> fstype="ext3" mountpo > > >>> > > >>> int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/> > > >>> > > >>> <nfsexport name="csarcsys-export"/> > > >>> > > >>> <nfsclient name="csarcsys-nfs-client" options="no_root_squash,rw" > > >>> path="/csarc-test" targe > > >>> > > >>> t="xxx.xxx.xxx.*"/> > > >>> > > >>> </resources> > > >>> > > >>> </rm> > > >>> > > >>> </cluster> > > >>> > > >>> Messages from the logs > > >>> > > >>> ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. > > >>> Refusing connection. > > >>> > > >>> Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing > > >>> connect: Connection refused > > >>> > > >>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. > > >>> Refusing connection. > > >>> > > >>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing > > >>> connect: Connection refused > > >>> > > >>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. > > >>> Refusing connection. > > >>> > > >>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing > > >>> connect: Connection refused > > >>> > > >>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. > > >>> Refusing connection. > > >>> > > >>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing > > >>> connect: Connection refused > > >>> > > >>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. > > >>> Refusing connection. > > >>> > > >>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing > > >>> connect: Connection refused > > >>> > > >>> > > >>> > > >>> > > >>> > > > > > ------------------------------------------------------------------------ > > > > > >> > > >> > > >>> > > >>> > > >>> -- > > >>> Linux-cluster mailing list > > >>> Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx> > > >>> https://www.redhat.com/mailman/listinfo/linux-cluster > > >>> > > >>> > > >>> > > >>> > > > > > ------------------------------------------------------------------------ > > > > > >> > > >> > > >>> -- > > >>> Linux-cluster mailing list > > >>> Linux-cluster@xxxxxxxxxx > > >>> https://www.redhat.com/mailman/listinfo/linux-cluster > > >>> > > >>> > > >> > > >> -- > > >> Linux-cluster mailing list > > >> Linux-cluster@xxxxxxxxxx > > >> https://www.redhat.com/mailman/listinfo/linux-cluster > > >> > > >> -- > > >> Linux-cluster mailing list > > >> Linux-cluster@xxxxxxxxxx > > >> https://www.redhat.com/mailman/listinfo/linux-cluster > > >> > > >> > > > > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster@xxxxxxxxxx > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster@xxxxxxxxxx > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster