RE: 3 node cluster problems

"Dalton, Maurice" <bobby.m.dalton@xxxxxxxx> · Tue, 25 Mar 2008 09:23:10 -0500

Title: RE:  3 node cluster problems

Same
problem.

I
now have qdiskd running.

I
have ran diff’s on all three cluster.conf files.. all are the same

[root@csarcsys1-eth0
cluster]# more cluster.conf

<?xml
version="1.0"?>

<cluster
config_version="6" name="csarcsys5">

<fence_daemon post_fail_delay="0"
post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0" nodeid="1"
votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0" nodeid="2"
votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0" nodeid="3"
votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsysfo" ordered="0"
restricted="1">

<failoverdomainnode name="csarcsys1-eth0" priority="1"/>

       <failoverdomainnode
name="csarcsys2-eth0" priority="1"/>

<failoverdomainnode name="csarcsys3-eth0" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="172.24.86.177" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0"
force_unmount="1" fsid="57739" fstype="ext3"
mountpo

int="/csarc-test"
name="csarcsys-fs" options="rw"
self_fence="0"/>

</resources>

</rm>

<quorumd interval="4" label="csarcsysQ"
min_score="1" tko="30" votes="2"/>

</cluster>

More
info from csarcsys3

[root@csarcsys3-eth0
cluster]# clustat

msg_open:
No such file or directory

Member
Status: Inquorate

Member
Name                       
ID   Status

------
----                       
---- ------

csarcsys1-eth0              
         1 Offline

csarcsys2-eth0                       
2 Offline

csarcsys3-eth0                       
3 Online, Local

/dev/sdd1                            
0 Offline

[root@csarcsys3-eth0
cluster]# mkqdisk -L

mkqdisk
v0.5.1

/dev/sdd1:

Magic:   eb7a62c2

Label:   csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host:    csarcsys1-eth0.xxx.xxx.nasa.gov

[root@csarcsys3-eth0
cluster]# ls -l /dev/sdd1

brw-r-----
1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

clustat
from csarcsys1

msg_open:
No such file or directory

Member
Status: Quorate

Member
Name                       
ID   Status

------
----                       
---- ------

csarcsys1-eth0                       
1 Online, Local

csarcsys2-eth0                       
2 Online

csarcsys3-eth0                       
3 Offline

/dev/sdd1                            
0 Offline, Quorum Disk

[root@csarcsys1-eth0
cluster]# ls -l /dev/sdd1

brw-r-----
1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

mkqdisk
v0.5.1

/dev/sdd1:

Magic:   eb7a62c2

Label:   csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host:    csarcsys1-eth0.xxx.xxx.nasa.gov

Info
from csarcsys2

root@csarcsys2-eth0
cluster]# clustat

msg_open:
No such file or directory

Member
Status: Quorate

Member
Name                       
ID   Status

------
----                       
---- ------

csarcsys1-eth0                       
1 Offline

csarcsys2-eth0                       
2 Online, Local

csarcsys3-eth0                       
3 Offline

/dev/sdd1                            
0 Online, Quorum Disk

From:
linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On
Behalf Of Panigrahi, Santosh Kumar

Sent: Tuesday, March 25, 2008 7:33 AM

To: linux clustering

Subject: RE:  3 node cluster problems

If you are
configuring your cluster by system-config-cluster then no need to run ricci/luci. Ricci/luci needed for
configuring the cluster using conga. You can configure in either ways.

On seeing your
clustat command outputs, it
seems cluster
is partitioned (spilt brain) into 2 sub clusters [Sub1- (csarcsys1-eth0, csarcsys2-eth0) 2- csarcsys3-eth0]. Without a quorum device
you can more often
face this situation. To avoid this you can configure a quorum device with a
heuristic like ping message. Use the link (http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/) for configuring a quorum
disk in RHCS.

Thanks,

S

-----Original
Message-----

From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx]
On Behalf Of Dalton, Maurice

Sent: Tuesday, March 25, 2008 5:18 PM

To: linux clustering

Subject: RE:  3 node cluster problems

Still no change.
Same as below. 

I completely
rebuilt the cluster using system-config-cluster

The Cluster
software was installed from rhn, luci and ricci are running.

This is the new
config file and it has been copied to the 2 other

systems

[root@csarcsys1-eth0
cluster]# more cluster.conf

<?xml
version="1.0"?>

<cluster
config_version="5" name="csarcsys5">

<fence_daemon post_fail_delay="0"
post_join_delay="3"/>

<clusternodes>

<clusternode name="csarcsys1-eth0" nodeid="1"
votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys2-eth0" nodeid="2"
votes="1">

<fence/>

</clusternode>

<clusternode name="csarcsys3-eth0" nodeid="3"
votes="1">

<fence/>

</clusternode>

</clusternodes>

<cman/>

<fencedevices/>

<rm>

<failoverdomains>

<failoverdomain name="csarcsysfo" ordered="0"

restricted="1">

<failoverdomainnode

name="csarcsys1-eth0"
priority="1"/>

<failoverdomainnode

name="csarcsys2-eth0"
priority="1"/>

<failoverdomainnode

name="csarcsys3-eth0"
priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="172.xx.xx.xxx" monitor_link="1"/>

<fs device="/dev/sdc1" force_fsck="0"

force_unmount="1"
fsid="57739" fstype="ext3" mountpo

int="/csarc-test"
name="csarcsys-fs" options="rw"
self_fence="0"/>

</resources>

</rm>

</cluster>

-----Original
Message-----

From:
linux-cluster-bounces@xxxxxxxxxx

[mailto:linux-cluster-bounces@xxxxxxxxxx]
On Behalf Of Bennie Thomas

Sent: Monday, March
24, 2008 4:17 PM

To: linux
clustering

Subject: Re:
 3 node cluster problems

Did you load the
Cluster software via Conga or manually ? You would have

had to load

luci on one node
and ricci on all three.

Try copying the
modified /etc/cluster/cluster.conf from csarcsys1 to the

other two nodes.

Make sure you can
ping the private interface to/from all nodes and 

reboot. If this
does not work

post your
/etc/cluster/cluster.conf file again.

Dalton, Maurice
wrote:

> Yes

> I also
rebooted again just now to be sure.

> 

> 

> -----Original
Message-----

> From:
linux-cluster-bounces@xxxxxxxxxx

> [mailto:linux-cluster-bounces@xxxxxxxxxx]
On Behalf Of Bennie Thomas

> Sent: Monday,
March 24, 2008 3:33 PM

> To: linux
clustering

> Subject: Re:
 3 node cluster problems

> 

> When you
changed the nodenames in the /etc/lcuster/cluster.conf and

made

> 

> sure the
/etc/hosts

> file had the
correct nodenames (Ie. 10.0.0.100  csarcsys1-eth0   

>
csarcsys1-eth0.xxxx.xxxx.xxx.)

> Did you reboot
all the nodes at the sametime ?

> 

> Dalton,
Maurice wrote:

>   

>> No luck.
It seems as if csarcsys3 thinks its in his own cluster

>> I renamed
all config files and rebuilt from system-config-cluster

>> 

>> Clustat
command from csarcsys3

>> 

>> 

>>
[root@csarcsys3-eth0 cluster]# clustat

>> msg_open:
No such file or directory

>> Member
Status: Inquorate

>> 

>>  
Member
Name                       
ID   Status

>>  
------
----                       
---- ------

>>  
csarcsys1-eth0                       
1 Offline

>>  
csarcsys2-eth0                       
2 Offline

>>  
csarcsys3-eth0                       
3 Online, Local

>> 

>> clustat
command from csarcsys2 

>> 

>>
[root@csarcsys2-eth0 cluster]# clustat

>> msg_open:
No such file or directory

>> Member
Status: Quorate

>> 

>>  
Member
Name                       
ID   Status

>>  
------
----                       
---- ------

>>  
csarcsys1-eth0                       
1 Online

>>  
csarcsys2-eth0                       
2 Online, Local

>>  
csarcsys3-eth0                       
3 Offline

>> 

>> 

>>
-----Original Message-----

>> From:
linux-cluster-bounces@xxxxxxxxxx

>> [mailto:linux-cluster-bounces@xxxxxxxxxx]
On Behalf Of Bennie Thomas

>> Sent: Monday,
March 24, 2008 2:25 PM

>> To: linux
clustering

>> Subject:
Re:  3 node cluster problems

>> 

>> You will
also, need to make sure the clustered nodenames are in your 

>> /etc/hosts
file.

>> Also, make
sure your cluster network interface is up on all nodes and

>> that the

>>
/etc/cluster/cluster.conf are the same on all nodes.

>> 

>> 

>> 

>> Dalton,
Maurice wrote:

>>  

>>    

>>> The
last post is incorrect.

>>> 

>>> Fence
is still hanging at start up.

>>> 

>>> Here's
another log message.

>>> 

>>> Mar 24
19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing 

>>>
connect: Connection refused

>>> 

>>> Mar 24
19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs 

>>> error
-111, check ccsd or cluster status

>>> 

>>>
*From:* linux-cluster-bounces@xxxxxxxxxx 

>>> [mailto:linux-cluster-bounces@xxxxxxxxxx]
*On Behalf Of *Bennie

>>>      

> Thomas

>   

>>>
*Sent:* Monday, March 24, 2008 11:22 AM

>>> *To:*
linux clustering

>>>
*Subject:* Re:  3 node cluster problems

>>> 

>>> try
removing the fully qualified hostname from the cluster.conf

file.

>>> 

>>> 

>>>
Dalton, Maurice wrote:

>>> 

>>> I have
NO fencing equipment

>>> 

>>> I have
been task to setup a 3 node cluster

>>> 

>>> Currently
I have having problems getting cman(fence) to start

>>> 

>>> Fence
will try to start up during cman start up but will fail

>>> 

>>> I
tried to run /sbin/fenced -D - I get the following

>>> 

>>>
1206373475 cman_init error 0 111

>>> 

>>> Here's
my cluster.conf file

>>> 

>>>
<?xml version="1.0"?>

>>> 

>>>
<cluster alias="csarcsys51" config_version="26"
name="csarcsys51">

>>> 

>>>
<fence_daemon clean_start="0" post_fail_delay="0"

>>>    

>>>      

>>
post_join_delay="3"/>

>>  

>>    

>>>
<clusternodes>

>>> 

>>>
<clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"
nodeid="1"

>>>    

>>>      

>>
votes="1">

>>  

>>    

>>>
<fence/>

>>> 

>>>
</clusternode>

>>> 

>>>
<clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
nodeid="2"

>>>    

>>>      

>> votes="1">

>>  

>>    

>>>
<fence/>

>>> 

>>>
</clusternode>

>>> 

>>>
<clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov"
nodeid="3"

>>>    

>>>      

>>
votes="1">

>>  

>>    

>>>
<fence/>

>>> 

>>>
</clusternode>

>>> 

>>>
</clusternodes>

>>> 

>>>
<cman/>

>>> 

>>>
<fencedevices/>

>>> 

>>>
<rm>

>>> 

>>>
<failoverdomains>

>>> 

>>>
<failoverdomain name="csarcsys-fo" ordered="1"
restricted="0">

>>> 

>>>
<failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"

>>>    

>>>      

>>
priority="1"/>

>>  

>>    

>>> <failoverdomainnode
name="csarcsys2-eth0.xxx.xxxx.nasa.gov"

>>>    

>>>      

>>
priority="1"/>

>>  

>>    

>>>
<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"

>>>    

>>>      

>>
priority="1"/>

>>  

>>    

>>>
</failoverdomain>

>>> 

>>>
</failoverdomains>

>>> 

>>>
<resources>

>>> 

>>> <ip
address="xxx.xxx.xxx.xxx" monitor_link="1"/>

>>> 

>>> <fs
device="/dev/sdc1" force_fsck="0"
force_unmount="1" fsid="57739"

>>>
fstype="ext3" mountpo

>>> 

>>>
int="/csarc-test" name="csarcsys-fs" options="rw"
self_fence="0"/>

>>> 

>>>
<nfsexport name="csarcsys-export"/>

>>> 

>>>
<nfsclient name="csarcsys-nfs-client"
options="no_root_squash,rw" 

>>>
path="/csarc-test" targe

>>> 

>>>
t="xxx.xxx.xxx.*"/>

>>> 

>>>
</resources>

>>> 

>>>
</rm>

>>> 

>>>
</cluster>

>>> 

>>>
Messages from the logs

>>> 

>>> ar 24
13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 

>>>
Refusing connection.

>>> 

>>> Mar 24
13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing 

>>>
connect: Connection refused

>>> 

>>> Mar 24
13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 

>>>
Refusing connection.

>>> 

>>> Mar 24
13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing 

>>>
connect: Connection refused

>>> 

>>> Mar 24
13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 

>>>
Refusing connection.

>>> 

>>> Mar 24
13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing 

>>>
connect: Connection refused

>>> 

>>> Mar 24
13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 

>>>
Refusing connection.

>>> 

>>> Mar 24
13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing 

>>>
connect: Connection refused

>>> 

>>> Mar 24
13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 

>>>
Refusing connection.

>>> 

>>> Mar 24
13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing 

>>>
connect: Connection refused

>>> 

>>>  

>>> 

>>>    

>>>      

> 

------------------------------------------------------------------------

>   

>>  

>>    

>>>  

>>>  

>>> --

>>>
Linux-cluster mailing list

>>> Linux-cluster@xxxxxxxxxx
<mailto:Linux-cluster@xxxxxxxxxx>

>>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>> 

>>> 

>>>    

>>>      

> 

------------------------------------------------------------------------

>   

>>  

>>    

>>> --

>>>
Linux-cluster mailing list

>>>
Linux-cluster@xxxxxxxxxx

>>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>>    

>>>      

>> 

>> --

>>
Linux-cluster mailing list

>>
Linux-cluster@xxxxxxxxxx

>> https://www.redhat.com/mailman/listinfo/linux-cluster

>> 

>> --

>>
Linux-cluster mailing list

>>
Linux-cluster@xxxxxxxxxx

>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>  

>>    

> 

> 

> --

> Linux-cluster
mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

> 

> --

> Linux-cluster
mailing list

>
Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

>   

--

Linux-cluster
mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster
mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster