Re: needs helps GFS2 on 5 nodes cluster

"Cao, Vinh" <vinh.cao@xxxxxx> · Wed, 7 Jan 2015 21:29:14 +0000

Hi Digimer,

Here is from the logs:
[root@ustlvcmsp1954 ~]# tail -f /var/log/messages
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed

Then it die at:
 Starting cman...                                        [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
                                                           [FAILED]

Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
I did have any disk quorum setup in cluster.conf file.

Any helps can I get appreciated.

Vinh

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 3:59 PM
To: linux clustering
Subject: Re:  needs helps GFS2 on 5 nodes cluster

On 07/01/15 03:39 PM, Cao, Vinh wrote:
> Hello Digimer,
>
> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>
> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
> root@ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml 
> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>          <clusternodes>
>                  <clusternode name="ustlvcmsp1954" nodeid="1"/>
>                  <clusternode name="ustlvcmsp1955" nodeid="2"/>
>                  <clusternode name="ustlvcmsp1956" nodeid="3"/>
>                  <clusternode name="ustlvcmsp1957" nodeid="4"/>
>                  <clusternode name="ustlvcmsp1958" nodeid="5"/>
>          </clusternodes>

You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).

>          <fencedevices>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>          </fencedevices>
> </cluster>
>
> clustat show:
>
> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member 
> Status: Quorate
>
>   Member Name                                                     ID   Status
>   ------ ----                                                     ---- ------
>   ustlvcmsp1954                                                       1 Offline
>   ustlvcmsp1955                                                       2 Online, Local
>   ustlvcmsp1956                                                       3 Online
>   ustlvcmsp1957                                                       4 Offline
>   ustlvcmsp1958                                                       5 Online
>
> I need to make them all online, so I can use fencing for mounting shared disk.
>
> Thanks,
> Vinh

What about the log entries from the start-up? Did you try the post_join_delay config?

> -----Original Message-----
> From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 3:16 PM
> To: linux clustering
> Subject: Re:  needs helps GFS2 on 5 nodes cluster
>
> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>
> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>
> Also, 6.4 is pretty old, why not upgrade to 6.6?
>
> digimer
>
> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>> Hello Cluster guru,
>>
>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes
>> I don't have any issue.
>>
>> But with 5 nodes, when I ran clustat I got 3 nodes online and the
>> other two off line.
>>
>> When I start the one that are off line. Service cman start. I got:
>>
>> [root@ustlvcmspxxx ~]# service cman status
>>
>> corosync is stopped
>>
>> [root@ustlvcmsp1954 ~]# service cman start
>>
>> Starting cluster:
>>
>>      Checking if cluster has been disabled at boot...        [  OK  ]
>>
>>      Checking Network Manager...                             [  OK  ]
>>
>>      Global setup...                                         [  OK  ]
>>
>>      Loading kernel modules...                               [  OK  ]
>>
>>      Mounting configfs...                                    [  OK  ]
>>
>>      Starting cman...                                        [  OK  ]
>>
>> Waiting for quorum... Timed-out waiting for cluster
>>
>>                                                              [FAILED]
>>
>> Stopping cluster:
>>
>>      Leaving fence domain...                                 [  OK  ]
>>
>>      Stopping gfs_controld...                                [  OK  ]
>>
>>      Stopping dlm_controld...                                [  OK  ]
>>
>>      Stopping fenced...                                      [  OK  ]
>>
>>      Stopping cman...                                        [  OK  ]
>>
>>      Waiting for corosync to shutdown:                       [  OK  ]
>>
>>      Unloading kernel modules...                             [  OK  ]
>>
>>      Unmounting configfs...                                  [  OK  ]
>>
>> Can you help?
>>
>> Thank you,
>>
>> Vinh
>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster