I am currently running several 3-node cluster without a quorum disk.
However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you send
your /etc/hosts file
for all systems, Also, could there be another node name called
csarcsys3-eth0 in your NIS or DNS
I configured some using Conga and some with system-config-cluster. When
using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames and
cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config files.
The file /var/log/messages should also shed some light on the problem.
Dalton, Maurice wrote:
Same problem.
I now have qdiskd running.
I have ran diff’s on all three cluster.conf files.. all are the same
[root@csarcsys1-eth0 cluster]# more cluster.conf
<?xml version="1.0"?>
<cluster config_version="6" name="csarcsys5">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="csarcsysfo" ordered="0" restricted="1">
<failoverdomainnode name="csarcsys1-eth0" priority="1"/>
<failoverdomainnode name="csarcsys2-eth0" priority="1"/>
<failoverdomainnode name="csarcsys3-eth0" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="172.24.86.177" monitor_link="1"/>
<fs device="/dev/sdc1" force_fsck="0" force_unmount="1" fsid="57739"
fstype="ext3" mountpo
int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
</resources>
</rm>
<quorumd interval="4" label="csarcsysQ" min_score="1" tko="30" votes="2"/>
</cluster>
More info from csarcsys3
[root@csarcsys3-eth0 cluster]# clustat
msg_open: No such file or directory
Member Status: Inquorate
Member Name ID Status
------ ---- ---- ------
csarcsys1-eth0 1 Offline
csarcsys2-eth0 2 Offline
csarcsys3-eth0 3 Online, Local
/dev/sdd1 0 Offline
[root@csarcsys3-eth0 cluster]# mkqdisk -L
mkqdisk v0.5.1
/dev/sdd1:
Magic: eb7a62c2
Label: csarcsysQ
Created: Wed Feb 13 13:44:35 2008
Host: csarcsys1-eth0.xxx.xxx.nasa.gov
[root@csarcsys3-eth0 cluster]# ls -l /dev/sdd1
brw-r----- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1
clustat from csarcsys1
msg_open: No such file or directory
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
csarcsys1-eth0 1 Online, Local
csarcsys2-eth0 2 Online
csarcsys3-eth0 3 Offline
/dev/sdd1 0 Offline, Quorum Disk
[root@csarcsys1-eth0 cluster]# ls -l /dev/sdd1
brw-r----- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1
mkqdisk v0.5.1
/dev/sdd1:
Magic: eb7a62c2
Label: csarcsysQ
Created: Wed Feb 13 13:44:35 2008
Host: csarcsys1-eth0.xxx.xxx.nasa.gov
Info from csarcsys2
root@csarcsys2-eth0 cluster]# clustat
msg_open: No such file or directory
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
csarcsys1-eth0 1 Offline
csarcsys2-eth0 2 Online, Local
csarcsys3-eth0 3 Offline
/dev/sdd1 0 Online, Quorum Disk
*From:* linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Panigrahi,
Santosh Kumar
*Sent:* Tuesday, March 25, 2008 7:33 AM
*To:* linux clustering
*Subject:* RE: 3 node cluster problems
If you are configuring your cluster by system-config-cluster then no
need to run ricci/luci. Ricci/luci needed for configuring the cluster
using conga. You can configure in either ways.
On seeing your clustat command outputs, it seems cluster is
partitioned (spilt brain) into 2 sub clusters [Sub1-*
**(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without a
quorum device you can more often face this situation. To avoid this
you can configure a quorum device with a heuristic like ping message.
Use the link
(http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/)
for configuring a quorum disk in RHCS.
Thanks,
S
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Dalton, Maurice
Sent: Tuesday, March 25, 2008 5:18 PM
To: linux clustering
Subject: RE: 3 node cluster problems
Still no change. Same as below.
I completely rebuilt the cluster using system-config-cluster
The Cluster software was installed from rhn, luci and ricci are running.
This is the new config file and it has been copied to the 2 other
systems
[root@csarcsys1-eth0 cluster]# more cluster.conf
<?xml version="1.0"?>
<cluster config_version="5" name="csarcsys5">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="csarcsys1-eth0" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys2-eth0" nodeid="2" votes="1">
<fence/>
</clusternode>
<clusternode name="csarcsys3-eth0" nodeid="3" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="csarcsysfo" ordered="0"
restricted="1">
<failoverdomainnode
name="csarcsys1-eth0" priority="1"/>
<failoverdomainnode
name="csarcsys2-eth0" priority="1"/>
<failoverdomainnode
name="csarcsys3-eth0" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="172.xx.xx.xxx" monitor_link="1"/>
<fs device="/dev/sdc1" force_fsck="0"
force_unmount="1" fsid="57739" fstype="ext3" mountpo
int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
</resources>
</rm>
</cluster>
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
Sent: Monday, March 24, 2008 4:17 PM
To: linux clustering
Subject: Re: 3 node cluster problems
Did you load the Cluster software via Conga or manually ? You would have
had to load
luci on one node and ricci on all three.
Try copying the modified /etc/cluster/cluster.conf from csarcsys1 to the
other two nodes.
Make sure you can ping the private interface to/from all nodes and
reboot. If this does not work
post your /etc/cluster/cluster.conf file again.
Dalton, Maurice wrote:
> Yes
> I also rebooted again just now to be sure.
>
>
> -----Original Message-----
> From: linux-cluster-bounces@xxxxxxxxxx
> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
> Sent: Monday, March 24, 2008 3:33 PM
> To: linux clustering
> Subject: Re: 3 node cluster problems
>
> When you changed the nodenames in the /etc/lcuster/cluster.conf and
made
>
> sure the /etc/hosts
> file had the correct nodenames (Ie. 10.0.0.100 csarcsys1-eth0
> csarcsys1-eth0.xxxx.xxxx.xxx.)
> Did you reboot all the nodes at the sametime ?
>
> Dalton, Maurice wrote:
>
>> No luck. It seems as if csarcsys3 thinks its in his own cluster
>> I renamed all config files and rebuilt from system-config-cluster
>>
>> Clustat command from csarcsys3
>>
>>
>> [root@csarcsys3-eth0 cluster]# clustat
>> msg_open: No such file or directory
>> Member Status: Inquorate
>>
>> Member Name ID Status
>> ------ ---- ---- ------
>> csarcsys1-eth0 1 Offline
>> csarcsys2-eth0 2 Offline
>> csarcsys3-eth0 3 Online, Local
>>
>> clustat command from csarcsys2
>>
>> [root@csarcsys2-eth0 cluster]# clustat
>> msg_open: No such file or directory
>> Member Status: Quorate
>>
>> Member Name ID Status
>> ------ ---- ---- ------
>> csarcsys1-eth0 1 Online
>> csarcsys2-eth0 2 Online, Local
>> csarcsys3-eth0 3 Offline
>>
>>
>> -----Original Message-----
>> From: linux-cluster-bounces@xxxxxxxxxx
>> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bennie Thomas
>> Sent: Monday, March 24, 2008 2:25 PM
>> To: linux clustering
>> Subject: Re: 3 node cluster problems
>>
>> You will also, need to make sure the clustered nodenames are in your
>> /etc/hosts file.
>> Also, make sure your cluster network interface is up on all nodes and
>> that the
>> /etc/cluster/cluster.conf are the same on all nodes.
>>
>>
>>
>> Dalton, Maurice wrote:
>>
>>
>>> The last post is incorrect.
>>>
>>> Fence is still hanging at start up.
>>>
>>> Here's another log message.
>>>
>>> Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing
>>> connect: Connection refused
>>>
>>> Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs
>>> error -111, check ccsd or cluster status
>>>
>>> *From:* linux-cluster-bounces@xxxxxxxxxx
>>> [mailto:linux-cluster-bounces@xxxxxxxxxx] *On Behalf Of *Bennie
>>>
> Thomas
>
>>> *Sent:* Monday, March 24, 2008 11:22 AM
>>> *To:* linux clustering
>>> *Subject:* Re: 3 node cluster problems
>>>
>>> try removing the fully qualified hostname from the cluster.conf
file.
>>>
>>>
>>> Dalton, Maurice wrote:
>>>
>>> I have NO fencing equipment
>>>
>>> I have been task to setup a 3 node cluster
>>>
>>> Currently I have having problems getting cman(fence) to start
>>>
>>> Fence will try to start up during cman start up but will fail
>>>
>>> I tried to run /sbin/fenced -D - I get the following
>>>
>>> 1206373475 cman_init error 0 111
>>>
>>> Here's my cluster.conf file
>>>
>>> <?xml version="1.0"?>
>>>
>>> <cluster alias="csarcsys51" config_version="26" name="csarcsys51">
>>>
>>> <fence_daemon clean_start="0" post_fail_delay="0"
>>>
>>>
>> post_join_delay="3"/>
>>
>>
>>> <clusternodes>
>>>
>>> <clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1"
>>>
>>>
>> votes="1">
>>
>>
>>> <fence/>
>>>
>>> </clusternode>
>>>
>>> <clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2"
>>>
>>>
>> votes="1">
>>
>>
>>> <fence/>
>>>
>>> </clusternode>
>>>
>>> <clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3"
>>>
>>>
>> votes="1">
>>
>>
>>> <fence/>
>>>
>>> </clusternode>
>>>
>>> </clusternodes>
>>>
>>> <cman/>
>>>
>>> <fencedevices/>
>>>
>>> <rm>
>>>
>>> <failoverdomains>
>>>
>>> <failoverdomain name="csarcsys-fo" ordered="1" restricted="0">
>>>
>>> <failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"
>>>
>>>
>> priority="1"/>
>>
>>
>>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
>>>
>>>
>> priority="1"/>
>>
>>
>>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
>>>
>>>
>> priority="1"/>
>>
>>
>>> </failoverdomain>
>>>
>>> </failoverdomains>
>>>
>>> <resources>
>>>
>>> <ip address="xxx.xxx.xxx.xxx" monitor_link="1"/>
>>>
>>> <fs device="/dev/sdc1" force_fsck="0" force_unmount="1" fsid="57739"
>>> fstype="ext3" mountpo
>>>
>>> int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
>>>
>>> <nfsexport name="csarcsys-export"/>
>>>
>>> <nfsclient name="csarcsys-nfs-client" options="no_root_squash,rw"
>>> path="/csarc-test" targe
>>>
>>> t="xxx.xxx.xxx.*"/>
>>>
>>> </resources>
>>>
>>> </rm>
>>>
>>> </cluster>
>>>
>>> Messages from the logs
>>>
>>> ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>>> Refusing connection.
>>>
>>> Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing
>>> connect: Connection refused
>>>
>>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>>> Refusing connection.
>>>
>>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing
>>> connect: Connection refused
>>>
>>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>>> Refusing connection.
>>>
>>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing
>>> connect: Connection refused
>>>
>>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>>> Refusing connection.
>>>
>>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing
>>> connect: Connection refused
>>>
>>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>>> Refusing connection.
>>>
>>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing
>>> connect: Connection refused
>>>
>>>
>>>
>>>
>>>
>
------------------------------------------------------------------------
>
>>
>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx>
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>>
>>>
>
------------------------------------------------------------------------
>
>>
>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@xxxxxxxxxx
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
------------------------------------------------------------------------
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster