Re: needs helps GFS2 on 5 nodes cluster

Digimer <lists@xxxxxxxxxx> · Thu, 08 Jan 2015 02:01:40 -0500

Please configure fencing. If you don't, it _will_ cause you problems.

On 07/01/15 09:48 PM, Cao, Vinh wrote:
Hi Digimer,

No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out.
I did try to set broadcast, but somehow it didn't work either.

Let me give broadcast a try again.

Thanks,
Vinh

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 5:51 PM
To: linux clustering
Subject: Re:  needs helps GFS2 on 5 nodes cluster

It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall?

On 07/01/15 05:32 PM, Cao, Vinh wrote:
Hi Digimer,

Yes, I just did. Looks like they are failing. I'm not sure why that is.
Please see the attachment for all servers log.

By the way, I do appreciated all the helps I can get.

Vinh

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 4:33 PM
To: linux clustering
Subject: Re:  needs helps GFS2 on 5 nodes cluster

Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please.

On 07/01/15 04:29 PM, Cao, Vinh wrote:
Hi Digimer,

Here is from the logs:
[root@ustlvcmsp1954 ~]# tail -f /var/log/messages
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed

Then it die at:
    Starting cman...                                        [  OK  ]
      Waiting for quorum... Timed-out waiting for cluster
                                                              [FAILED]

Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
I did have any disk quorum setup in cluster.conf file.

Any helps can I get appreciated.

Vinh

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 3:59 PM
To: linux clustering
Subject: Re:  needs helps GFS2 on 5 nodes cluster

On 07/01/15 03:39 PM, Cao, Vinh wrote:
Hello Digimer,

Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.

Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
root@ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml
version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
            <clusternodes>
                    <clusternode name="ustlvcmsp1954" nodeid="1"/>
                    <clusternode name="ustlvcmsp1955" nodeid="2"/>
                    <clusternode name="ustlvcmsp1956" nodeid="3"/>
                    <clusternode name="ustlvcmsp1957" nodeid="4"/>
                    <clusternode name="ustlvcmsp1958" nodeid="5"/>
            </clusternodes>

You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).

            <fencedevices>
                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
            </fencedevices>
</cluster>

clustat show:

Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member
Status: Quorate

     Member Name                                                     ID   Status
     ------ ----                                                     ---- ------
     ustlvcmsp1954                                                       1 Offline
     ustlvcmsp1955                                                       2 Online, Local
     ustlvcmsp1956                                                       3 Online
     ustlvcmsp1957                                                       4 Offline
     ustlvcmsp1958                                                       5 Online

I need to make them all online, so I can use fencing for mounting shared disk.

Thanks,
Vinh

What about the log entries from the start-up? Did you try the post_join_delay config?

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 3:16 PM
To: linux clustering
Subject: Re:  needs helps GFS2 on 5 nodes cluster

My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.

If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.

Also, 6.4 is pretty old, why not upgrade to 6.6?

digimer

On 07/01/15 03:10 PM, Cao, Vinh wrote:
Hello Cluster guru,

I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two
nodes I don't have any issue.

But with 5 nodes, when I ran clustat I got 3 nodes online and the
other two off line.

When I start the one that are off line. Service cman start. I got:

[root@ustlvcmspxxx ~]# service cman status

corosync is stopped

[root@ustlvcmsp1954 ~]# service cman start

Starting cluster:

        Checking if cluster has been disabled at boot...        [  OK  ]

        Checking Network Manager...                             [  OK  ]

        Global setup...                                         [  OK  ]

        Loading kernel modules...                               [  OK  ]

        Mounting configfs...                                    [  OK  ]

        Starting cman...                                        [  OK  ]

Waiting for quorum... Timed-out waiting for cluster

[FAILED]

Stopping cluster:

        Leaving fence domain...                                 [  OK  ]

        Stopping gfs_controld...                                [  OK  ]

        Stopping dlm_controld...                                [  OK  ]

        Stopping fenced...                                      [  OK  ]

        Stopping cman...                                        [  OK  ]

        Waiting for corosync to shutdown:                       [  OK  ]

        Unloading kernel modules...                             [  OK  ]

        Unmounting configfs...                                  [  OK  ]

Can you help?

Thank you,

Vinh

--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster