Same
problem. I
now have qdiskd running. I
have ran diff’s on all three cluster.conf files.. all are the same [root@csarcsys1-eth0
cluster]# more cluster.conf <?xml
version="1.0"?> <cluster
config_version="6" name="csarcsys5">
<fence_daemon post_fail_delay="0"
post_join_delay="3"/>
<clusternodes>
<clusternode name="csarcsys1-eth0" nodeid="1"
votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys2-eth0" nodeid="2"
votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys3-eth0" nodeid="3"
votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="csarcsysfo" ordered="0"
restricted="1">
<failoverdomainnode name="csarcsys1-eth0" priority="1"/>
<failoverdomainnode
name="csarcsys2-eth0" priority="1"/>
<failoverdomainnode name="csarcsys3-eth0" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="172.24.86.177" monitor_link="1"/>
<fs device="/dev/sdc1" force_fsck="0"
force_unmount="1" fsid="57739" fstype="ext3"
mountpo int="/csarc-test"
name="csarcsys-fs" options="rw"
self_fence="0"/>
</resources>
</rm>
<quorumd interval="4" label="csarcsysQ"
min_score="1" tko="30" votes="2"/> </cluster> More
info from csarcsys3 [root@csarcsys3-eth0
cluster]# clustat msg_open:
No such file or directory Member
Status: Inquorate
Member
Name
ID Status
------
----
---- ------
csarcsys1-eth0
1 Offline
csarcsys2-eth0
2 Offline
csarcsys3-eth0
3 Online, Local
/dev/sdd1
0 Offline [root@csarcsys3-eth0
cluster]# mkqdisk -L mkqdisk
v0.5.1 /dev/sdd1:
Magic: eb7a62c2
Label: csarcsysQ
Created: Wed Feb 13 13:44:35 2008
Host: csarcsys1-eth0.xxx.xxx.nasa.gov [root@csarcsys3-eth0
cluster]# ls -l /dev/sdd1 brw-r-----
1 root disk 8, 49 Mar 25 14:09 /dev/sdd1 clustat
from csarcsys1 msg_open:
No such file or directory Member
Status: Quorate
Member
Name
ID Status
------
----
---- ------
csarcsys1-eth0
1 Online, Local
csarcsys2-eth0
2 Online
csarcsys3-eth0
3 Offline
/dev/sdd1
0 Offline, Quorum Disk [root@csarcsys1-eth0
cluster]# ls -l /dev/sdd1 brw-r-----
1 root disk 8, 49 Mar 25 14:19 /dev/sdd1 mkqdisk
v0.5.1 /dev/sdd1:
Magic: eb7a62c2
Label: csarcsysQ
Created: Wed Feb 13 13:44:35 2008
Host: csarcsys1-eth0.xxx.xxx.nasa.gov Info
from csarcsys2 root@csarcsys2-eth0
cluster]# clustat msg_open:
No such file or directory Member
Status: Quorate
Member
Name
ID Status
------
----
---- ------
csarcsys1-eth0
1 Offline
csarcsys2-eth0
2 Online, Local
csarcsys3-eth0
3 Offline
/dev/sdd1
0 Online, Quorum Disk From:
linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On
Behalf Of Panigrahi, Santosh Kumar If you are
configuring your cluster by system-config-cluster then no need to run ricci/luci. Ricci/luci needed for
configuring the cluster using conga. You can configure in either ways. On seeing your
clustat command outputs, it
seems cluster
is partitioned (spilt brain) into 2 sub clusters [Sub1- (csarcsys1-eth0, csarcsys2-eth0) 2- csarcsys3-eth0]. Without a quorum device
you can more often
face this situation. To avoid this you can configure a quorum device with a
heuristic like ping message. Use the link (http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/) for configuring a quorum
disk in RHCS. Thanks, S -----Original
Message----- Still no change.
Same as below. I completely
rebuilt the cluster using system-config-cluster The Cluster
software was installed from rhn, luci and ricci are running. This is the new
config file and it has been copied to the 2 other systems [root@csarcsys1-eth0
cluster]# more cluster.conf <?xml
version="1.0"?> <cluster
config_version="5" name="csarcsys5">
<fence_daemon post_fail_delay="0"
post_join_delay="3"/>
<clusternodes>
<clusternode name="csarcsys1-eth0" nodeid="1"
votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys2-eth0" nodeid="2"
votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys3-eth0" nodeid="3"
votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="csarcsysfo" ordered="0" restricted="1">
<failoverdomainnode name="csarcsys1-eth0"
priority="1"/>
<failoverdomainnode name="csarcsys2-eth0"
priority="1"/>
<failoverdomainnode name="csarcsys3-eth0"
priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="172.xx.xx.xxx" monitor_link="1"/>
<fs device="/dev/sdc1" force_fsck="0" force_unmount="1"
fsid="57739" fstype="ext3" mountpo int="/csarc-test"
name="csarcsys-fs" options="rw"
self_fence="0"/>
</resources>
</rm> </cluster> -----Original
Message----- From:
linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx]
On Behalf Of Bennie Thomas Sent: Monday, March
24, 2008 4:17 PM To: linux
clustering Subject: Re:
3 node cluster problems Did you load the
Cluster software via Conga or manually ? You would have had to load luci on one node
and ricci on all three. Try copying the
modified /etc/cluster/cluster.conf from csarcsys1 to the other two nodes. Make sure you can
ping the private interface to/from all nodes and reboot. If this
does not work post your
/etc/cluster/cluster.conf file again. Dalton, Maurice
wrote: > Yes > I also
rebooted again just now to be sure. > > > -----Original
Message----- > From:
linux-cluster-bounces@xxxxxxxxxx > [mailto:linux-cluster-bounces@xxxxxxxxxx]
On Behalf Of Bennie Thomas > Sent: Monday,
March 24, 2008 3:33 PM > To: linux
clustering > Subject: Re:
3 node cluster problems > > When you
changed the nodenames in the /etc/lcuster/cluster.conf and made > > sure the
/etc/hosts > file had the
correct nodenames (Ie. 10.0.0.100 csarcsys1-eth0 >
csarcsys1-eth0.xxxx.xxxx.xxx.) > Did you reboot
all the nodes at the sametime ? > > Dalton,
Maurice wrote: > >> No luck.
It seems as if csarcsys3 thinks its in his own cluster >> I renamed
all config files and rebuilt from system-config-cluster >> >> Clustat
command from csarcsys3 >> >> >>
[root@csarcsys3-eth0 cluster]# clustat >> msg_open:
No such file or directory >> Member
Status: Inquorate >> >>
Member
Name
ID Status >>
------
----
---- ------ >>
csarcsys1-eth0
1 Offline >>
csarcsys2-eth0
2 Offline >>
csarcsys3-eth0
3 Online, Local >> >> clustat
command from csarcsys2 >> >>
[root@csarcsys2-eth0 cluster]# clustat >> msg_open:
No such file or directory >> Member
Status: Quorate >> >>
Member
Name
ID Status >>
------
----
---- ------ >>
csarcsys1-eth0
1 Online >>
csarcsys2-eth0
2 Online, Local >>
csarcsys3-eth0
3 Offline >> >> >>
-----Original Message----- >> From:
linux-cluster-bounces@xxxxxxxxxx >> [mailto:linux-cluster-bounces@xxxxxxxxxx]
On Behalf Of Bennie Thomas >> Sent: Monday,
March 24, 2008 2:25 PM >> To: linux
clustering >> Subject:
Re: 3 node cluster problems >> >> You will
also, need to make sure the clustered nodenames are in your >> /etc/hosts
file. >> Also, make
sure your cluster network interface is up on all nodes and >> that the >>
/etc/cluster/cluster.conf are the same on all nodes. >> >> >> >> Dalton,
Maurice wrote: >>
>>
>>> The
last post is incorrect. >>> >>> Fence
is still hanging at start up. >>> >>> Here's
another log message. >>> >>> Mar 24
19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing >>>
connect: Connection refused >>> >>> Mar 24
19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs >>> error
-111, check ccsd or cluster status >>> >>>
*From:* linux-cluster-bounces@xxxxxxxxxx >>> [mailto:linux-cluster-bounces@xxxxxxxxxx]
*On Behalf Of *Bennie >>>
> Thomas > >>>
*Sent:* Monday, March 24, 2008 11:22 AM >>> *To:*
linux clustering >>>
*Subject:* Re: 3 node cluster problems >>> >>> try
removing the fully qualified hostname from the cluster.conf file. >>> >>> >>>
Dalton, Maurice wrote: >>> >>> I have
NO fencing equipment >>> >>> I have
been task to setup a 3 node cluster >>> >>> Currently
I have having problems getting cman(fence) to start >>> >>> Fence
will try to start up during cman start up but will fail >>> >>> I
tried to run /sbin/fenced -D - I get the following >>> >>>
1206373475 cman_init error 0 111 >>> >>> Here's
my cluster.conf file >>> >>>
<?xml version="1.0"?> >>> >>>
<cluster alias="csarcsys51" config_version="26"
name="csarcsys51"> >>> >>>
<fence_daemon clean_start="0" post_fail_delay="0" >>>
>>>
>>
post_join_delay="3"/> >>
>>
>>>
<clusternodes> >>> >>>
<clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"
nodeid="1" >>>
>>>
>>
votes="1"> >>
>>
>>>
<fence/> >>> >>>
</clusternode> >>> >>>
<clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
nodeid="2" >>>
>>>
>> votes="1"> >>
>>
>>>
<fence/> >>> >>>
</clusternode> >>> >>>
<clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov"
nodeid="3" >>>
>>>
>>
votes="1"> >>
>>
>>>
<fence/> >>> >>>
</clusternode> >>> >>>
</clusternodes> >>> >>>
<cman/> >>> >>>
<fencedevices/> >>> >>>
<rm> >>> >>>
<failoverdomains> >>> >>>
<failoverdomain name="csarcsys-fo" ordered="1"
restricted="0"> >>> >>>
<failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" >>>
>>>
>>
priority="1"/> >>
>>
>>> <failoverdomainnode
name="csarcsys2-eth0.xxx.xxxx.nasa.gov" >>>
>>>
>>
priority="1"/> >>
>>
>>>
<failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" >>>
>>>
>>
priority="1"/> >>
>>
>>>
</failoverdomain> >>> >>>
</failoverdomains> >>> >>>
<resources> >>> >>> <ip
address="xxx.xxx.xxx.xxx" monitor_link="1"/> >>> >>> <fs
device="/dev/sdc1" force_fsck="0"
force_unmount="1" fsid="57739" >>>
fstype="ext3" mountpo >>> >>>
int="/csarc-test" name="csarcsys-fs" options="rw"
self_fence="0"/> >>> >>>
<nfsexport name="csarcsys-export"/> >>> >>>
<nfsclient name="csarcsys-nfs-client"
options="no_root_squash,rw" >>>
path="/csarc-test" targe >>> >>>
t="xxx.xxx.xxx.*"/> >>> >>>
</resources> >>> >>>
</rm> >>> >>>
</cluster> >>> >>>
Messages from the logs >>> >>> ar 24
13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. >>>
Refusing connection. >>> >>> Mar 24
13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing >>>
connect: Connection refused >>> >>> Mar 24
13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. >>>
Refusing connection. >>> >>> Mar 24
13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing >>>
connect: Connection refused >>> >>> Mar 24
13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. >>>
Refusing connection. >>> >>> Mar 24
13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing >>>
connect: Connection refused >>> >>> Mar 24
13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. >>>
Refusing connection. >>> >>> Mar 24
13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing >>>
connect: Connection refused >>> >>> Mar 24
13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. >>>
Refusing connection. >>> >>> Mar 24
13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing >>>
connect: Connection refused >>> >>> >>> >>>
>>>
> ------------------------------------------------------------------------ > >>
>>
>>>
>>> >>> -- >>>
Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx
<mailto:Linux-cluster@xxxxxxxxxx> >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>>
>>>
> ------------------------------------------------------------------------ > >>
>>
>>> -- >>>
Linux-cluster mailing list >>>
Linux-cluster@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>
>>>
>> >> -- >>
Linux-cluster mailing list >>
Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> -- >>
Linux-cluster mailing list >>
Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster >>
>>
> > > -- > Linux-cluster
mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster
mailing list >
Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster
mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster
mailing list Linux-cluster@xxxxxxxxxx |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster