Hi Digimer, No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out. I did try to set broadcast, but somehow it didn't work either. Let me give broadcast a try again. Thanks, Vinh -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer Sent: Wednesday, January 07, 2015 5:51 PM To: linux clustering Subject: Re: needs helps GFS2 on 5 nodes cluster It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall? On 07/01/15 05:32 PM, Cao, Vinh wrote: > Hi Digimer, > > Yes, I just did. Looks like they are failing. I'm not sure why that is. > Please see the attachment for all servers log. > > By the way, I do appreciated all the helps I can get. > > Vinh > > -----Original Message----- > From: linux-cluster-bounces@xxxxxxxxxx > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 4:33 PM > To: linux clustering > Subject: Re: needs helps GFS2 on 5 nodes cluster > > Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please. > > On 07/01/15 04:29 PM, Cao, Vinh wrote: >> Hi Digimer, >> >> Here is from the logs: >> [root@ustlvcmsp1954 ~]# tail -f /var/log/messages >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0) >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service. >> Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines. >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. >> Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed >> >> Then it die at: >> Starting cman... [ OK ] >> Waiting for quorum... Timed-out waiting for cluster >> [FAILED] >> >> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum? >> I did have any disk quorum setup in cluster.conf file. >> >> Any helps can I get appreciated. >> >> Vinh >> >> -----Original Message----- >> From: linux-cluster-bounces@xxxxxxxxxx >> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer >> Sent: Wednesday, January 07, 2015 3:59 PM >> To: linux clustering >> Subject: Re: needs helps GFS2 on 5 nodes cluster >> >> On 07/01/15 03:39 PM, Cao, Vinh wrote: >>> Hello Digimer, >>> >>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. >>> >>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. >>> root@ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml >>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958"> >>> <clusternodes> >>> <clusternode name="ustlvcmsp1954" nodeid="1"/> >>> <clusternode name="ustlvcmsp1955" nodeid="2"/> >>> <clusternode name="ustlvcmsp1956" nodeid="3"/> >>> <clusternode name="ustlvcmsp1957" nodeid="4"/> >>> <clusternode name="ustlvcmsp1958" nodeid="5"/> >>> </clusternodes> >> >> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). >> >>> <fencedevices> >>> <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/> >>> <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/> >>> <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/> >>> <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/> >>> <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/> >>> </fencedevices> >>> </cluster> >>> >>> clustat show: >>> >>> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member >>> Status: Quorate >>> >>> Member Name ID Status >>> ------ ---- ---- ------ >>> ustlvcmsp1954 1 Offline >>> ustlvcmsp1955 2 Online, Local >>> ustlvcmsp1956 3 Online >>> ustlvcmsp1957 4 Offline >>> ustlvcmsp1958 5 Online >>> >>> I need to make them all online, so I can use fencing for mounting shared disk. >>> >>> Thanks, >>> Vinh >> >> What about the log entries from the start-up? Did you try the post_join_delay config? >> >> >>> -----Original Message----- >>> From: linux-cluster-bounces@xxxxxxxxxx >>> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer >>> Sent: Wednesday, January 07, 2015 3:16 PM >>> To: linux clustering >>> Subject: Re: needs helps GFS2 on 5 nodes cluster >>> >>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf. >>> >>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. >>> >>> Also, 6.4 is pretty old, why not upgrade to 6.6? >>> >>> digimer >>> >>> On 07/01/15 03:10 PM, Cao, Vinh wrote: >>>> Hello Cluster guru, >>>> >>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two >>>> nodes I don't have any issue. >>>> >>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the >>>> other two off line. >>>> >>>> When I start the one that are off line. Service cman start. I got: >>>> >>>> [root@ustlvcmspxxx ~]# service cman status >>>> >>>> corosync is stopped >>>> >>>> [root@ustlvcmsp1954 ~]# service cman start >>>> >>>> Starting cluster: >>>> >>>> Checking if cluster has been disabled at boot... [ OK ] >>>> >>>> Checking Network Manager... [ OK ] >>>> >>>> Global setup... [ OK ] >>>> >>>> Loading kernel modules... [ OK ] >>>> >>>> Mounting configfs... [ OK ] >>>> >>>> Starting cman... [ OK ] >>>> >>>> Waiting for quorum... Timed-out waiting for cluster >>>> >>>> >>>> [FAILED] >>>> >>>> Stopping cluster: >>>> >>>> Leaving fence domain... [ OK ] >>>> >>>> Stopping gfs_controld... [ OK ] >>>> >>>> Stopping dlm_controld... [ OK ] >>>> >>>> Stopping fenced... [ OK ] >>>> >>>> Stopping cman... [ OK ] >>>> >>>> Waiting for corosync to shutdown: [ OK ] >>>> >>>> Unloading kernel modules... [ OK ] >>>> >>>> Unmounting configfs... [ OK ] >>>> >>>> Can you help? >>>> >>>> Thank you, >>>> >>>> Vinh >>>> >>>> >>>> >>> >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster