Re: 3 node cluster problems

Bennie Thomas <Bennie_R_Thomas@xxxxxxxxxxxx> · Thu, 27 Mar 2008 10:28:26 -0500

Are you using a private vlan for your cluster communications. If not, 
you should be. the communicatuions
between the clustered nodes is very chatty Just my opinion.

These are my opinions and experiences.

Any views or opinions presented are solely those of the author and do not necessarily 
represent those of Raytheon unless specifically stated. 
Electronic communications including email might be monitored by Raytheon. 
for operational or business reasons.

Dalton, Maurice wrote:
Cisco 3550

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:53 AM
To: linux clustering
Subject: Re:  3 node cluster problems

what is the switch brand.   I have read where the RHCS has problems with

certain switches

Dalton, Maurice wrote:

Switches

Storage is fiber

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:04 AM
To: linux clustering
Subject: Re:  3 node cluster problems

How is your Cluster connections connected. (ie. Are you using a 
hub,switch or direct connecting the heartbeat cables) ?

Dalton, Maurice wrote:

Still having the problem. I can't figure it out. 

I just upgraded to the latest 5.1 cman.. No help.!!!!!!!!!

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:57 AM
To: linux clustering
Subject: Re:  3 node cluster problems

Glad they are working. I have not used lvm with our Clusters. You

know

have peaked
my curiosity and I will have to try building one. So were you also

using

GFS ?

Dalton, Maurice wrote:

Sorry but security here will not allow me to send host files

BUT.

I was getting this in /var/log/messages on csarcsys3

Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs

error

-111, check ccsd or cluster status
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused

I had /dev/vg0/gfsvol on these systems.

I did a lvremove 

Restarted cman on all systems and for some strange reason my

clusters

are working.

It doesn't make any sense.

I can't thank you enough for your help.......!!!!!!

Thanks.

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:27 AM
To: linux clustering
Subject: Re:  3 node cluster problems

I am currently running several 3-node cluster without a quorum disk.

However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you

send

your /etc/hosts file
for all systems, Also, could there be another node name called 
csarcsys3-eth0 in your NIS or DNS

I configured some using Conga and some with system-config-cluster.

When 

using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames

and

cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config
files.

The file /var/log/messages should also shed some light on the

problem.

Dalton, Maurice wrote:

Same problem.

I now have qdiskd running.

I have ran diff's on all three cluster.conf files.. all are the

same

[root@csarcsys1-eth0 cluster]# more cluster.conf

<?xml version="1.0"?>

<cluster config_version="6" name="csarcsys5">

<fence_daemon post_fail_delay="0" post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsysfo" ordered="0" restricted="1">

<failoverdomainnode name="csarcsys1-eth0" priority="1"/>

<failoverdomainnode name="csarcsys2-eth0" priority="1"/>

<failoverdomainnode name="csarcsys3-eth0" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="172.24.86.177" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0" force_unmount="1"

fsid="57739"

fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

</resources>

</rm>

<quorumd interval="4" label="csarcsysQ" min_score="1" tko="30"

votes="2"/>

</cluster>

More info from csarcsys3

[root@csarcsys3-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Inquorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Offline

csarcsys3-eth0 3 Online, Local

/dev/sdd1 0 Offline

[root@csarcsys3-eth0 cluster]# mkqdisk -L

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

[root@csarcsys3-eth0 cluster]# ls -l /dev/sdd1

brw-r----- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

clustat from csarcsys1

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Online, Local

csarcsys2-eth0 2 Online

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Offline, Quorum Disk

[root@csarcsys1-eth0 cluster]# ls -l /dev/sdd1

brw-r----- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

Info from csarcsys2

root@csarcsys2-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Online, Local

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Online, Quorum Disk

*From:* linux-cluster-bounces@xxxxxxxxxx 
[mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Panigrahi,

Santosh Kumar
*Sent:* Tuesday, March 25, 2008 7:33 AM
*To:* linux clustering
*Subject:* RE:  3 node cluster problems

If you are configuring your cluster by system-config-cluster then

no

need to run ricci/luci. Ricci/luci needed for configuring the

cluster

using conga. You can configure in either ways.

On seeing your clustat command outputs, it seems cluster is 
partitioned (spilt brain) into 2 sub clusters [Sub1-* 
**(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without

a

quorum device you can more often face this situation. To avoid this

you can configure a quorum device with a heuristic like ping

message.

Use the link 

(http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-

qdisk/) 

for configuring a quorum disk in RHCS.

Thanks,

S

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx 
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Dalton,

Maurice

Sent: Tuesday, March 25, 2008 5:18 PM
To: linux clustering
Subject: RE:  3 node cluster problems

Still no change. Same as below.

I completely rebuilt the cluster using system-config-cluster

The Cluster software was installed from rhn, luci and ricci are

running.

This is the new config file and it has been copied to the 2 other

systems

[root@csarcsys1-eth0 cluster]# more cluster.conf

<?xml version="1.0"?>

<cluster config_version="5" name="csarcsys5">

<fence_daemon post_fail_delay="0" post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsysfo" ordered="0"

restricted="1">

<failoverdomainnode

name="csarcsys1-eth0" priority="1"/>

<failoverdomainnode

name="csarcsys2-eth0" priority="1"/>

<failoverdomainnode

name="csarcsys3-eth0" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="172.xx.xx.xxx" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0"

force_unmount="1" fsid="57739" fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

</resources>

</rm>

</cluster>

-----Original Message-----

From: linux-cluster-bounces@xxxxxxxxxx

[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie

Thomas

Sent: Monday, March 24, 2008 4:17 PM

To: linux clustering

Subject: Re:  3 node cluster problems

Did you load the Cluster software via Conga or manually ? You would

have

had to load

luci on one node and ricci on all three.

Try copying the modified /etc/cluster/cluster.conf from csarcsys1

to

the

other two nodes.

Make sure you can ping the private interface to/from all nodes and

reboot. If this does not work

post your /etc/cluster/cluster.conf file again.

Dalton, Maurice wrote:

Yes

I also rebooted again just now to be sure.

-----Original Message-----

From: linux-cluster-bounces@xxxxxxxxxx

[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie

Thomas

Sent: Monday, March 24, 2008 3:33 PM

To: linux clustering

Subject: Re:  3 node cluster problems

When you changed the nodenames in the /etc/lcuster/cluster.conf

and

made

sure the /etc/hosts

file had the correct nodenames (Ie. 10.0.0.100 csarcsys1-eth0

csarcsys1-eth0.xxxx.xxxx.xxx.)

Did you reboot all the nodes at the sametime ?

Dalton, Maurice wrote:

No luck. It seems as if csarcsys3 thinks its in his own cluster

I renamed all config files and rebuilt from system-config-cluster

Clustat command from csarcsys3

[root@csarcsys3-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Inquorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Offline

csarcsys3-eth0 3 Online, Local

clustat command from csarcsys2

[root@csarcsys2-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Online

csarcsys2-eth0 2 Online, Local

csarcsys3-eth0 3 Offline

-----Original Message-----

From: linux-cluster-bounces@xxxxxxxxxx

[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie

Thomas

Sent: Monday, March 24, 2008 2:25 PM

To: linux clustering

Subject: Re:  3 node cluster problems

You will also, need to make sure the clustered nodenames are in

your

/etc/hosts file.

Also, make sure your cluster network interface is up on all nodes

and

that the

/etc/cluster/cluster.conf are the same on all nodes.

Dalton, Maurice wrote:

The last post is incorrect.

Fence is still hanging at start up.

Here's another log message.

Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while

processing

connect: Connection refused

Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to

ccs

error -111, check ccsd or cluster status

*From:* linux-cluster-bounces@xxxxxxxxxx

[mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Bennie

Thomas

*Sent:* Monday, March 24, 2008 11:22 AM

*To:* linux clustering

*Subject:* Re:  3 node cluster problems

try removing the fully qualified hostname from the cluster.conf

file.

Dalton, Maurice wrote:

I have NO fencing equipment

I have been task to setup a 3 node cluster

Currently I have having problems getting cman(fence) to start

Fence will try to start up during cman start up but will fail

I tried to run /sbin/fenced -D - I get the following

1206373475 cman_init error 0 111

Here's my cluster.conf file

<?xml version="1.0"?>

<cluster alias="csarcsys51" config_version="26"

name="csarcsys51">

<fence_daemon clean_start="0" post_fail_delay="0"

post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1"

votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2"

votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3"

votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsys-fo" ordered="1" restricted="0">

<failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"

priority="1"/>

<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"

priority="1"/>

<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"

priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="xxx.xxx.xxx.xxx" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0" force_unmount="1"

fsid="57739"

fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw"

self_fence="0"/>

<nfsexport name="csarcsys-export"/>

<nfsclient name="csarcsys-nfs-client"

options="no_root_squash,rw"

path="/csarc-test" targe

t="xxx.xxx.xxx.*"/>

</resources>

</rm>

</cluster>

Messages from the logs

ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not

quorate.

Refusing connection.

Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while

processing

connect: Connection refused

Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not

quorate.

Refusing connection.

Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while

processing

connect: Connection refused

Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not

quorate.

Refusing connection.

Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while

processing

connect: Connection refused

Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not

quorate.

Refusing connection.

Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while

processing

connect: Connection refused

Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not

quorate.

Refusing connection.

Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while

processing

connect: Connection refused

------------------------------------------------------------------------

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx>

https://www.redhat.com/mailman/listinfo/linux-cluster

------------------------------------------------------------------------

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

------------------------------------------------------------------------

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster