Are you using a private vlan for your cluster communications. If not,
you should be. the communicatuions
between the clustered nodes is very chatty Just my opinion.
These are my opinions and experiences.
Any views or opinions presented are solely those of the author and do not necessarily
represent those of Raytheon unless specifically stated.
Electronic communications including email might be monitored by Raytheon.
for operational or business reasons.
Dalton, Maurice wrote:
Cisco 3550
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:53 AM
To: linux clustering
Subject: Re: 3 node cluster problems
what is the switch brand. I have read where the RHCS has problems with
certain switches
Dalton, Maurice wrote:
Switches
Storage is fiber
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:04 AM
To: linux clustering
Subject: Re: 3 node cluster problems
How is your Cluster connections connected. (ie. Are you using a
hub,switch or direct connecting the heartbeat cables) ?
Dalton, Maurice wrote:
Still having the problem. I can't figure it out.
I just upgraded to the latest 5.1 cman.. No help.!!!!!!!!!
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:57 AM
To: linux clustering
Subject: Re: 3 node cluster problems
Glad they are working. I have not used lvm with our Clusters. You
know
have peaked
my curiosity and I will have to try building one. So were you also
using
GFS ?
Dalton, Maurice wrote:
Sorry but security here will not allow me to send host files
BUT.
I was getting this in /var/log/messages on csarcsys3
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs
error
-111, check ccsd or cluster status
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
I had /dev/vg0/gfsvol on these systems.
I did a lvremove
Restarted cman on all systems and for some strange reason my
clusters
are working.
It doesn't make any sense.
I can't thank you enough for your help.......!!!!!!
Thanks.
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:27 AM
To: linux clustering
Subject: Re: 3 node cluster problems
I am currently running several 3-node cluster without a quorum disk.
However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you
send
your /etc/hosts file
for all systems, Also, could there be another node name called
csarcsys3-eth0 in your NIS or DNS
I configured some using Conga and some with system-config-cluster.
When
using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames
and
cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config
files.
The file /var/log/messages should also shed some light on the
problem.
Dalton, Maurice wrote:
Same problem.
I now have qdiskd running.
I have ran diff's on all three cluster.conf files.. all are the
same
[root@csarcsys1-eth0 cluster]# more cluster.conf
<?xml version="1.0"?>
<cluster config_version="6" name="csarcsys5">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="csarcsysfo" ordered="0" restricted="1">
<failoverdomainnode name="csarcsys1-eth0" priority="1"/>
<failoverdomainnode name="csarcsys2-eth0" priority="1"/>
<failoverdomainnode name="csarcsys3-eth0" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="172.24.86.177" monitor_link="1"/>
<fs device="/dev/sdc1" force_fsck="0" force_unmount="1"
fsid="57739"
fstype="ext3" mountpo
int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
</resources>
</rm>
<quorumd interval="4" label="csarcsysQ" min_score="1" tko="30"
votes="2"/>
</cluster>
More info from csarcsys3
[root@csarcsys3-eth0 cluster]# clustat
msg_open: No such file or directory
Member Status: Inquorate
Member Name ID Status
------ ---- ---- ------
csarcsys1-eth0 1 Offline
csarcsys2-eth0 2 Offline
csarcsys3-eth0 3 Online, Local
/dev/sdd1 0 Offline
[root@csarcsys3-eth0 cluster]# mkqdisk -L
mkqdisk v0.5.1
/dev/sdd1:
Magic: eb7a62c2
Label: csarcsysQ
Created: Wed Feb 13 13:44:35 2008
Host: csarcsys1-eth0.xxx.xxx.nasa.gov
[root@csarcsys3-eth0 cluster]# ls -l /dev/sdd1
brw-r----- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1
clustat from csarcsys1
msg_open: No such file or directory
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
csarcsys1-eth0 1 Online, Local
csarcsys2-eth0 2 Online
csarcsys3-eth0 3 Offline
/dev/sdd1 0 Offline, Quorum Disk
[root@csarcsys1-eth0 cluster]# ls -l /dev/sdd1
brw-r----- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1
mkqdisk v0.5.1
/dev/sdd1:
Magic: eb7a62c2
Label: csarcsysQ
Created: Wed Feb 13 13:44:35 2008
Host: csarcsys1-eth0.xxx.xxx.nasa.gov
Info from csarcsys2
root@csarcsys2-eth0 cluster]# clustat
msg_open: No such file or directory
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
csarcsys1-eth0 1 Offline
csarcsys2-eth0 2 Online, Local
csarcsys3-eth0 3 Offline
/dev/sdd1 0 Online, Quorum Disk
*From:* linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Panigrahi,
Santosh Kumar
*Sent:* Tuesday, March 25, 2008 7:33 AM
*To:* linux clustering
*Subject:* RE: 3 node cluster problems
If you are configuring your cluster by system-config-cluster then
no
need to run ricci/luci. Ricci/luci needed for configuring the
cluster
using conga. You can configure in either ways.
On seeing your clustat command outputs, it seems cluster is
partitioned (spilt brain) into 2 sub clusters [Sub1-*
**(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without
a
quorum device you can more often face this situation. To avoid this
you can configure a quorum device with a heuristic like ping
message.
Use the link
(http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-
qdisk/)
for configuring a quorum disk in RHCS.
Thanks,
S
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Dalton,
Maurice
Sent: Tuesday, March 25, 2008 5:18 PM
To: linux clustering
Subject: RE: 3 node cluster problems
Still no change. Same as below.
I completely rebuilt the cluster using system-config-cluster
The Cluster software was installed from rhn, luci and ricci are
running.
This is the new config file and it has been copied to the 2 other
systems
[root@csarcsys1-eth0 cluster]# more cluster.conf
<?xml version="1.0"?>
<cluster config_version="5" name="csarcsys5">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="csarcsysfo" ordered="0"
restricted="1">
<failoverdomainnode
name="csarcsys1-eth0" priority="1"/>
<failoverdomainnode
name="csarcsys2-eth0" priority="1"/>
<failoverdomainnode
name="csarcsys3-eth0" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="172.xx.xx.xxx" monitor_link="1"/>
<fs device="/dev/sdc1" force_fsck="0"
force_unmount="1" fsid="57739" fstype="ext3" mountpo
int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
</resources>
</rm>
</cluster>
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie
Thomas
Sent: Monday, March 24, 2008 4:17 PM
To: linux clustering
Subject: Re: 3 node cluster problems
Did you load the Cluster software via Conga or manually ? You would
have
had to load
luci on one node and ricci on all three.
Try copying the modified /etc/cluster/cluster.conf from csarcsys1
to
the
other two nodes.
Make sure you can ping the private interface to/from all nodes and
reboot. If this does not work
post your /etc/cluster/cluster.conf file again.
Dalton, Maurice wrote:
Yes
I also rebooted again just now to be sure.
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie
Thomas
Sent: Monday, March 24, 2008 3:33 PM
To: linux clustering
Subject: Re: 3 node cluster problems
When you changed the nodenames in the /etc/lcuster/cluster.conf
and
made
sure the /etc/hosts
file had the correct nodenames (Ie. 10.0.0.100 csarcsys1-eth0
csarcsys1-eth0.xxxx.xxxx.xxx.)
Did you reboot all the nodes at the sametime ?
Dalton, Maurice wrote:
No luck. It seems as if csarcsys3 thinks its in his own cluster
I renamed all config files and rebuilt from system-config-cluster
Clustat command from csarcsys3
[root@csarcsys3-eth0 cluster]# clustat
msg_open: No such file or directory
Member Status: Inquorate
Member Name ID Status
------ ---- ---- ------
csarcsys1-eth0 1 Offline
csarcsys2-eth0 2 Offline
csarcsys3-eth0 3 Online, Local
clustat command from csarcsys2
[root@csarcsys2-eth0 cluster]# clustat
msg_open: No such file or directory
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
csarcsys1-eth0 1 Online
csarcsys2-eth0 2 Online, Local
csarcsys3-eth0 3 Offline
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie
Thomas
Sent: Monday, March 24, 2008 2:25 PM
To: linux clustering
Subject: Re: 3 node cluster problems
You will also, need to make sure the clustered nodenames are in
your
/etc/hosts file.
Also, make sure your cluster network interface is up on all nodes
and
that the
/etc/cluster/cluster.conf are the same on all nodes.
Dalton, Maurice wrote:
The last post is incorrect.
Fence is still hanging at start up.
Here's another log message.
Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while
processing
connect: Connection refused
Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to
ccs
error -111, check ccsd or cluster status
*From:* linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Bennie
Thomas
*Sent:* Monday, March 24, 2008 11:22 AM
*To:* linux clustering
*Subject:* Re: 3 node cluster problems
try removing the fully qualified hostname from the cluster.conf
file.
Dalton, Maurice wrote:
I have NO fencing equipment
I have been task to setup a 3 node cluster
Currently I have having problems getting cman(fence) to start
Fence will try to start up during cman start up but will fail
I tried to run /sbin/fenced -D - I get the following
1206373475 cman_init error 0 111
Here's my cluster.conf file
<?xml version="1.0"?>
<cluster alias="csarcsys51" config_version="26"
name="csarcsys51">
<fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
<clusternodes>
<clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1"
votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2"
votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3"
votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="csarcsys-fo" ordered="1" restricted="0">
<failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"
priority="1"/>
<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
priority="1"/>
<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="xxx.xxx.xxx.xxx" monitor_link="1"/>
<fs device="/dev/sdc1" force_fsck="0" force_unmount="1"
fsid="57739"
fstype="ext3" mountpo
int="/csarc-test" name="csarcsys-fs" options="rw"
self_fence="0"/>
<nfsexport name="csarcsys-export"/>
<nfsclient name="csarcsys-nfs-client"
options="no_root_squash,rw"
path="/csarc-test" targe
t="xxx.xxx.xxx.*"/>
</resources>
</rm>
</cluster>
Messages from the logs
ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while
processing
connect: Connection refused
Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while
processing
connect: Connection refused
Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while
processing
connect: Connection refused
Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while
processing
connect: Connection refused
Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while
processing
connect: Connection refused
------------------------------------------------------------------------
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx>
https://www.redhat.com/mailman/listinfo/linux-cluster
------------------------------------------------------------------------
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
------------------------------------------------------------------------
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster