Re: 3 node cluster problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Are you using a private vlan for your cluster communications. If not, you should be. the communicatuions
between the clustered nodes is very chatty Just my opinion.

These are my opinions and experiences.

Any views or opinions presented are solely those of the author and do not necessarily represent those of Raytheon unless specifically stated. Electronic communications including email might be monitored by Raytheon. for operational or business reasons.


Dalton, Maurice wrote:
Cisco 3550


-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:53 AM
To: linux clustering
Subject: Re:  3 node cluster problems

what is the switch brand.   I have read where the RHCS has problems with

certain switches

Dalton, Maurice wrote:
Switches

Storage is fiber


-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:04 AM
To: linux clustering
Subject: Re:  3 node cluster problems

How is your Cluster connections connected. (ie. Are you using a hub,switch or direct connecting the heartbeat cables) ?

Dalton, Maurice wrote:
Still having the problem. I can't figure it out.
I just upgraded to the latest 5.1 cman.. No help.!!!!!!!!!


-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:57 AM
To: linux clustering
Subject: Re:  3 node cluster problems


Glad they are working. I have not used lvm with our Clusters. You
know
have peaked
my curiosity and I will have to try building one. So were you also
using
GFS ?

Dalton, Maurice wrote:
Sorry but security here will not allow me to send host files

BUT.


I was getting this in /var/log/messages on csarcsys3

Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs
error
-111, check ccsd or cluster status
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused


I had /dev/vg0/gfsvol on these systems.

I did a lvremove
Restarted cman on all systems and for some strange reason my
clusters
are working.

It doesn't make any sense.

I can't thank you enough for your help.......!!!!!!


Thanks.


-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:27 AM
To: linux clustering
Subject: Re:  3 node cluster problems

I am currently running several 3-node cluster without a quorum disk.

However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you
send
your /etc/hosts file
for all systems, Also, could there be another node name called csarcsys3-eth0 in your NIS or DNS

I configured some using Conga and some with system-config-cluster.
When
using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames
and
cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config
files.

The file /var/log/messages should also shed some light on the
problem.
Dalton, Maurice wrote:
Same problem.

I now have qdiskd running.

I have ran diff's on all three cluster.conf files.. all are the
same
[root@csarcsys1-eth0 cluster]# more cluster.conf

<?xml version="1.0"?>

<cluster config_version="6" name="csarcsys5">

<fence_daemon post_fail_delay="0" post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsysfo" ordered="0" restricted="1">

<failoverdomainnode name="csarcsys1-eth0" priority="1"/>

<failoverdomainnode name="csarcsys2-eth0" priority="1"/>

<failoverdomainnode name="csarcsys3-eth0" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="172.24.86.177" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0" force_unmount="1"
fsid="57739"
fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

</resources>

</rm>

<quorumd interval="4" label="csarcsysQ" min_score="1" tko="30"
votes="2"/>
</cluster>

More info from csarcsys3

[root@csarcsys3-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Inquorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Offline

csarcsys3-eth0 3 Online, Local

/dev/sdd1 0 Offline

[root@csarcsys3-eth0 cluster]# mkqdisk -L

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

[root@csarcsys3-eth0 cluster]# ls -l /dev/sdd1

brw-r----- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

clustat from csarcsys1

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Online, Local

csarcsys2-eth0 2 Online

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Offline, Quorum Disk

[root@csarcsys1-eth0 cluster]# ls -l /dev/sdd1

brw-r----- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

Info from csarcsys2

root@csarcsys2-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Online, Local

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Online, Quorum Disk

*From:* linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Panigrahi,

Santosh Kumar
*Sent:* Tuesday, March 25, 2008 7:33 AM
*To:* linux clustering
*Subject:* RE:  3 node cluster problems

If you are configuring your cluster by system-config-cluster then
no
need to run ricci/luci. Ricci/luci needed for configuring the
cluster
using conga. You can configure in either ways.

On seeing your clustat command outputs, it seems cluster is partitioned (spilt brain) into 2 sub clusters [Sub1-* **(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without
a
quorum device you can more often face this situation. To avoid this

you can configure a quorum device with a heuristic like ping
message.
Use the link
(http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-
qdisk/)
for configuring a quorum disk in RHCS.

Thanks,

S

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Dalton,
Maurice
Sent: Tuesday, March 25, 2008 5:18 PM
To: linux clustering
Subject: RE:  3 node cluster problems

Still no change. Same as below.

I completely rebuilt the cluster using system-config-cluster

The Cluster software was installed from rhn, luci and ricci are
running.
This is the new config file and it has been copied to the 2 other

systems

[root@csarcsys1-eth0 cluster]# more cluster.conf

<?xml version="1.0"?>

<cluster config_version="5" name="csarcsys5">

<fence_daemon post_fail_delay="0" post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsysfo" ordered="0"

restricted="1">

<failoverdomainnode

name="csarcsys1-eth0" priority="1"/>

<failoverdomainnode

name="csarcsys2-eth0" priority="1"/>

<failoverdomainnode

name="csarcsys3-eth0" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="172.xx.xx.xxx" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0"

force_unmount="1" fsid="57739" fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

</resources>

</rm>

</cluster>

-----Original Message-----

From: linux-cluster-bounces@xxxxxxxxxx

[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie
Thomas
Sent: Monday, March 24, 2008 4:17 PM

To: linux clustering

Subject: Re:  3 node cluster problems

Did you load the Cluster software via Conga or manually ? You would
have
had to load

luci on one node and ricci on all three.

Try copying the modified /etc/cluster/cluster.conf from csarcsys1
to
the
other two nodes.

Make sure you can ping the private interface to/from all nodes and

reboot. If this does not work

post your /etc/cluster/cluster.conf file again.

Dalton, Maurice wrote:

Yes
I also rebooted again just now to be sure. -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie
Thomas
Sent: Monday, March 24, 2008 3:33 PM To: linux clustering Subject: Re: 3 node cluster problems When you changed the nodenames in the /etc/lcuster/cluster.conf
and
made

sure the /etc/hosts
file had the correct nodenames (Ie. 10.0.0.100 csarcsys1-eth0 csarcsys1-eth0.xxxx.xxxx.xxx.) Did you reboot all the nodes at the sametime ? Dalton, Maurice wrote:
No luck. It seems as if csarcsys3 thinks its in his own cluster
I renamed all config files and rebuilt from system-config-cluster Clustat command from csarcsys3 [root@csarcsys3-eth0 cluster]# clustat msg_open: No such file or directory Member Status: Inquorate Member Name ID Status ------ ---- ---- ------ csarcsys1-eth0 1 Offline csarcsys2-eth0 2 Offline csarcsys3-eth0 3 Online, Local clustat command from csarcsys2 [root@csarcsys2-eth0 cluster]# clustat msg_open: No such file or directory Member Status: Quorate Member Name ID Status ------ ---- ---- ------ csarcsys1-eth0 1 Online csarcsys2-eth0 2 Online, Local csarcsys3-eth0 3 Offline -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie
Thomas
Sent: Monday, March 24, 2008 2:25 PM
To: linux clustering Subject: Re: 3 node cluster problems You will also, need to make sure the clustered nodenames are in
your
/etc/hosts file.
Also, make sure your cluster network interface is up on all nodes
and
that the
/etc/cluster/cluster.conf are the same on all nodes. Dalton, Maurice wrote:
The last post is incorrect.
Fence is still hanging at start up.

Here's another log message. Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while
processing
connect: Connection refused Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to
ccs
error -111, check ccsd or cluster status *From:* linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Bennie
Thomas
*Sent:* Monday, March 24, 2008 11:22 AM
*To:* linux clustering *Subject:* Re: 3 node cluster problems try removing the fully qualified hostname from the cluster.conf
file.

Dalton, Maurice wrote:
I have NO fencing equipment I have been task to setup a 3 node cluster Currently I have having problems getting cman(fence) to start Fence will try to start up during cman start up but will fail I tried to run /sbin/fenced -D - I get the following 1206373475 cman_init error 0 111 Here's my cluster.conf file <?xml version="1.0"?> <cluster alias="csarcsys51" config_version="26"
name="csarcsys51">
<fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
<clusternodes>
<clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1"
votes="1">
<fence/>
</clusternode> <clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2"
votes="1">
<fence/>
</clusternode> <clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3"
votes="1">
<fence/>
</clusternode> </clusternodes> <cman/> <fencedevices/> <rm> <failoverdomains> <failoverdomain name="csarcsys-fo" ordered="1" restricted="0"> <failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"
priority="1"/>
<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
priority="1"/>
<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
priority="1"/>
</failoverdomain>
</failoverdomains> <resources> <ip address="xxx.xxx.xxx.xxx" monitor_link="1"/> <fs device="/dev/sdc1" force_fsck="0" force_unmount="1"
fsid="57739"
fstype="ext3" mountpo
int="/csarc-test" name="csarcsys-fs" options="rw"
self_fence="0"/>
<nfsexport name="csarcsys-export"/> <nfsclient name="csarcsys-nfs-client"
options="no_root_squash,rw"
path="/csarc-test" targe t="xxx.xxx.xxx.*"/> </resources> </rm> </cluster> Messages from the logs ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection. Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while
processing
connect: Connection refused Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while
processing
connect: Connection refused Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while
processing
connect: Connection refused Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while
processing
connect: Connection refused Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not
quorate.
Refusing connection.
Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while
processing
connect: Connection refused
------------------------------------------------------------------------
--
Linux-cluster mailing list Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx> https://www.redhat.com/mailman/listinfo/linux-cluster
------------------------------------------------------------------------
--
Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster
--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster


------------------------------------------------------------------------
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux