Re: 3 node cluster problems

Bennie Thomas <Bennie_R_Thomas@xxxxxxxxxxxx> · Tue, 25 Mar 2008 10:56:54 -0500

Glad they are working. I have not used lvm with our Clusters. You know 
have peaked
my curiosity and I will have to try building one. So were you also using 
GFS ?

Dalton, Maurice wrote:
Sorry but security here will not allow me to send host files

BUT.

I was getting this in /var/log/messages on csarcsys3

Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs error
-111, check ccsd or cluster status
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused

I had /dev/vg0/gfsvol on these systems.

I did a lvremove 

Restarted cman on all systems and for some strange reason my clusters
are working.

It doesn't make any sense.

I can't thank you enough for your help.......!!!!!!

Thanks.

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:27 AM
To: linux clustering
Subject: Re:  3 node cluster problems

I am currently running several 3-node cluster without a quorum disk. 
However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you send 
your /etc/hosts file
for all systems, Also, could there be another node name called 
csarcsys3-eth0 in your NIS or DNS

I configured some using Conga and some with system-config-cluster. When 
using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames and 
cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config
files.

The file /var/log/messages should also shed some light on the problem.

Dalton, Maurice wrote:

Same problem.

I now have qdiskd running.

I have ran diff's on all three cluster.conf files.. all are the same

[root@csarcsys1-eth0 cluster]# more cluster.conf

<?xml version="1.0"?>

<cluster config_version="6" name="csarcsys5">

<fence_daemon post_fail_delay="0" post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsysfo" ordered="0" restricted="1">

<failoverdomainnode name="csarcsys1-eth0" priority="1"/>

<failoverdomainnode name="csarcsys2-eth0" priority="1"/>

<failoverdomainnode name="csarcsys3-eth0" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="172.24.86.177" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0" force_unmount="1" fsid="57739" 
fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

</resources>

</rm>

<quorumd interval="4" label="csarcsysQ" min_score="1" tko="30"

votes="2"/>

</cluster>

More info from csarcsys3

[root@csarcsys3-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Inquorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Offline

csarcsys3-eth0 3 Online, Local

/dev/sdd1 0 Offline

[root@csarcsys3-eth0 cluster]# mkqdisk -L

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

[root@csarcsys3-eth0 cluster]# ls -l /dev/sdd1

brw-r----- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

clustat from csarcsys1

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Online, Local

csarcsys2-eth0 2 Online

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Offline, Quorum Disk

[root@csarcsys1-eth0 cluster]# ls -l /dev/sdd1

brw-r----- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

Info from csarcsys2

root@csarcsys2-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Online, Local

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Online, Quorum Disk

*From:* linux-cluster-bounces@xxxxxxxxxx 
[mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Panigrahi, 
Santosh Kumar
*Sent:* Tuesday, March 25, 2008 7:33 AM
*To:* linux clustering
*Subject:* RE:  3 node cluster problems

If you are configuring your cluster by system-config-cluster then no 
need to run ricci/luci. Ricci/luci needed for configuring the cluster 
using conga. You can configure in either ways.

On seeing your clustat command outputs, it seems cluster is 
partitioned (spilt brain) into 2 sub clusters [Sub1-* 
**(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without a 
quorum device you can more often face this situation. To avoid this 
you can configure a quorum device with a heuristic like ping message. 
Use the link 

(http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-
qdisk/) 

for configuring a quorum disk in RHCS.

Thanks,

S

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx 
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Dalton, Maurice
Sent: Tuesday, March 25, 2008 5:18 PM
To: linux clustering
Subject: RE:  3 node cluster problems

Still no change. Same as below.

I completely rebuilt the cluster using system-config-cluster

The Cluster software was installed from rhn, luci and ricci are

running.

This is the new config file and it has been copied to the 2 other

systems

[root@csarcsys1-eth0 cluster]# more cluster.conf

<?xml version="1.0"?>

<cluster config_version="5" name="csarcsys5">

<fence_daemon post_fail_delay="0" post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsysfo" ordered="0"

restricted="1">

<failoverdomainnode

name="csarcsys1-eth0" priority="1"/>

<failoverdomainnode

name="csarcsys2-eth0" priority="1"/>

<failoverdomainnode

name="csarcsys3-eth0" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="172.xx.xx.xxx" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0"

force_unmount="1" fsid="57739" fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

</resources>

</rm>

</cluster>

-----Original Message-----

From: linux-cluster-bounces@xxxxxxxxxx

[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas

Sent: Monday, March 24, 2008 4:17 PM

To: linux clustering

Subject: Re:  3 node cluster problems

Did you load the Cluster software via Conga or manually ? You would

have

had to load

luci on one node and ricci on all three.

Try copying the modified /etc/cluster/cluster.conf from csarcsys1 to

the

other two nodes.

Make sure you can ping the private interface to/from all nodes and

reboot. If this does not work

post your /etc/cluster/cluster.conf file again.

Dalton, Maurice wrote:

Yes

I also rebooted again just now to be sure.

-----Original Message-----

From: linux-cluster-bounces@xxxxxxxxxx

[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas

Sent: Monday, March 24, 2008 3:33 PM

To: linux clustering

Subject: Re:  3 node cluster problems

When you changed the nodenames in the /etc/lcuster/cluster.conf and

made

sure the /etc/hosts

file had the correct nodenames (Ie. 10.0.0.100 csarcsys1-eth0

csarcsys1-eth0.xxxx.xxxx.xxx.)

Did you reboot all the nodes at the sametime ?

Dalton, Maurice wrote:

No luck. It seems as if csarcsys3 thinks its in his own cluster

I renamed all config files and rebuilt from system-config-cluster

Clustat command from csarcsys3

[root@csarcsys3-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Inquorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Offline

csarcsys3-eth0 3 Online, Local

clustat command from csarcsys2

[root@csarcsys2-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

csarcsys1-eth0 1 Online

csarcsys2-eth0 2 Online, Local

csarcsys3-eth0 3 Offline

-----Original Message-----

From: linux-cluster-bounces@xxxxxxxxxx

[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie

Thomas

Sent: Monday, March 24, 2008 2:25 PM

To: linux clustering

Subject: Re:  3 node cluster problems

You will also, need to make sure the clustered nodenames are in

your

/etc/hosts file.

Also, make sure your cluster network interface is up on all nodes

and

that the

/etc/cluster/cluster.conf are the same on all nodes.

Dalton, Maurice wrote:

The last post is incorrect.

Fence is still hanging at start up.

Here's another log message.

Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing

connect: Connection refused

Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs

error -111, check ccsd or cluster status

*From:* linux-cluster-bounces@xxxxxxxxxx

[mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Bennie

Thomas

*Sent:* Monday, March 24, 2008 11:22 AM

*To:* linux clustering

*Subject:* Re:  3 node cluster problems

try removing the fully qualified hostname from the cluster.conf

file.

Dalton, Maurice wrote:

I have NO fencing equipment

I have been task to setup a 3 node cluster

Currently I have having problems getting cman(fence) to start

Fence will try to start up during cman start up but will fail

I tried to run /sbin/fenced -D - I get the following

1206373475 cman_init error 0 111

Here's my cluster.conf file

<?xml version="1.0"?>

<cluster alias="csarcsys51" config_version="26" name="csarcsys51">

<fence_daemon clean_start="0" post_fail_delay="0"

post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1"

votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2"

votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3"

votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsys-fo" ordered="1" restricted="0">

<failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"

priority="1"/>

<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"

priority="1"/>

<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"

priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="xxx.xxx.xxx.xxx" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0" force_unmount="1"

fsid="57739"

fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

<nfsexport name="csarcsys-export"/>

<nfsclient name="csarcsys-nfs-client" options="no_root_squash,rw"

path="/csarc-test" targe

t="xxx.xxx.xxx.*"/>

</resources>

</rm>

</cluster>

Messages from the logs

ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.

Refusing connection.

Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing

connect: Connection refused

Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not

quorate.

Refusing connection.

Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing

connect: Connection refused

Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not

quorate.

Refusing connection.

Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing

connect: Connection refused

Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not

quorate.

Refusing connection.

Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing

connect: Connection refused

Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not

quorate.

Refusing connection.

Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing

connect: Connection refused

------------------------------------------------------------------------

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx>

https://www.redhat.com/mailman/listinfo/linux-cluster

------------------------------------------------------------------------

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

------------------------------------------------------------------------

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster