Quorum is enabled by default. I need to see the entire logs from all
five nodes, as I mentioned in the first email. Please disable cman from
starting on boot, configure fencing properly and then reboot all nodes
cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes,
then in another window, start cman on all five nodes. When things settle
down, copy/paste all the log output please.
On 07/01/15 04:29 PM, Cao, Vinh wrote:
Hi Digimer,
Here is from the logs:
[root@ustlvcmsp1954 ~]# tail -f /var/log/messages
Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service
Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman
Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1
Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1
Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines.
Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service
Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service
Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01
Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service
Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01
Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90
Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
Then it die at:
Starting cman... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]
Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
I did have any disk quorum setup in cluster.conf file.
Any helps can I get appreciated.
Vinh
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 3:59 PM
To: linux clustering
Subject: Re: needs helps GFS2 on 5 nodes cluster
On 07/01/15 03:39 PM, Cao, Vinh wrote:
Hello Digimer,
Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
root@ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml
version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
<clusternodes>
<clusternode name="ustlvcmsp1954" nodeid="1"/>
<clusternode name="ustlvcmsp1955" nodeid="2"/>
<clusternode name="ustlvcmsp1956" nodeid="3"/>
<clusternode name="ustlvcmsp1957" nodeid="4"/>
<clusternode name="ustlvcmsp1958" nodeid="5"/>
</clusternodes>
You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
<fencedevices>
<fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
<fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
<fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
<fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
<fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
</fencedevices>
</cluster>
clustat show:
Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member
Status: Quorate
Member Name ID Status
------ ---- ---- ------
ustlvcmsp1954 1 Offline
ustlvcmsp1955 2 Online, Local
ustlvcmsp1956 3 Online
ustlvcmsp1957 4 Offline
ustlvcmsp1958 5 Online
I need to make them all online, so I can use fencing for mounting shared disk.
Thanks,
Vinh
What about the log entries from the start-up? Did you try the post_join_delay config?
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 3:16 PM
To: linux clustering
Subject: Re: needs helps GFS2 on 5 nodes cluster
My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
Also, 6.4 is pretty old, why not upgrade to 6.6?
digimer
On 07/01/15 03:10 PM, Cao, Vinh wrote:
Hello Cluster guru,
I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes
I don't have any issue.
But with 5 nodes, when I ran clustat I got 3 nodes online and the
other two off line.
When I start the one that are off line. Service cman start. I got:
[root@ustlvcmspxxx ~]# service cman status
corosync is stopped
[root@ustlvcmsp1954 ~]# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]
Stopping cluster:
Leaving fence domain... [ OK ]
Stopping gfs_controld... [ OK ]
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping cman... [ OK ]
Waiting for corosync to shutdown: [ OK ]
Unloading kernel modules... [ OK ]
Unmounting configfs... [ OK ]
Can you help?
Thank you,
Vinh
--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster