Re: needs helps GFS2 on 5 nodes cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Digimer,

The problem solved. First of all, I just want to thank you your time to stay with me on the issue I have. You are also correct about fencing.
But here is it breaking down to.

1. I forgot, when I create the cluster. I didn't join these system in cluster set yet. You know one for a long while I have to setup cluster. I did write documentation about all this.
But I still forget to follow it to the teeth. That is what happens. So I have to run: cman_tool join for all nodes. This is the key.
2. after join all nodes into cluster. I'm able to start cman via: service cman start
3. then configure fencing
4. then add static config mount device into /etc/fstab
5. then reboot each node one by one. They are all come back and well.

I do have this error in logs: (it mean our multicast is not using. I'm using broadcast for now. But if we have multicast network not blocking, then that error would go away. That is my thought.)

[TOTEM ] Received message has invalid digest... ignoring.
Jan  8 08:34:33 ustlvcmsp1956 corosync[21194]:   [TOTEM ] Invalid packet data

Again, thanks for your helps.
Vinh

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
Sent: Thursday, January 08, 2015 2:02 AM
To: linux clustering
Subject: Re:  needs helps GFS2 on 5 nodes cluster

Please configure fencing. If you don't, it _will_ cause you problems.

On 07/01/15 09:48 PM, Cao, Vinh wrote:
> Hi Digimer,
>
> No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out.
> I did try to set broadcast, but somehow it didn't work either.
>
> Let me give broadcast a try again.
>
> Thanks,
> Vinh
>
> -----Original Message-----
> From: linux-cluster-bounces@xxxxxxxxxx 
> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 5:51 PM
> To: linux clustering
> Subject: Re:  needs helps GFS2 on 5 nodes cluster
>
> It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall?
>
> On 07/01/15 05:32 PM, Cao, Vinh wrote:
>> Hi Digimer,
>>
>> Yes, I just did. Looks like they are failing. I'm not sure why that is.
>> Please see the attachment for all servers log.
>>
>> By the way, I do appreciated all the helps I can get.
>>
>> Vinh
>>
>> -----Original Message-----
>> From: linux-cluster-bounces@xxxxxxxxxx 
>> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
>> Sent: Wednesday, January 07, 2015 4:33 PM
>> To: linux clustering
>> Subject: Re:  needs helps GFS2 on 5 nodes cluster
>>
>> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please.
>>
>> On 07/01/15 04:29 PM, Cao, Vinh wrote:
>>> Hi Digimer,
>>>
>>> Here is from the logs:
>>> [root@ustlvcmsp1954 ~]# tail -f /var/log/messages
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>> Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
>>> Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
>>>
>>> Then it die at:
>>>     Starting cman...                                        [  OK  ]
>>>       Waiting for quorum... Timed-out waiting for cluster
>>>                                                               
>>> [FAILED]
>>>
>>> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
>>> I did have any disk quorum setup in cluster.conf file.
>>>
>>> Any helps can I get appreciated.
>>>
>>> Vinh
>>>
>>> -----Original Message-----
>>> From: linux-cluster-bounces@xxxxxxxxxx 
>>> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
>>> Sent: Wednesday, January 07, 2015 3:59 PM
>>> To: linux clustering
>>> Subject: Re:  needs helps GFS2 on 5 nodes cluster
>>>
>>> On 07/01/15 03:39 PM, Cao, Vinh wrote:
>>>> Hello Digimer,
>>>>
>>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>>>>
>>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
>>>> root@ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml 
>>>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>>>>             <clusternodes>
>>>>                     <clusternode name="ustlvcmsp1954" nodeid="1"/>
>>>>                     <clusternode name="ustlvcmsp1955" nodeid="2"/>
>>>>                     <clusternode name="ustlvcmsp1956" nodeid="3"/>
>>>>                     <clusternode name="ustlvcmsp1957" nodeid="4"/>
>>>>                     <clusternode name="ustlvcmsp1958" nodeid="5"/>
>>>>             </clusternodes>
>>>
>>> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
>>>
>>>>             <fencedevices>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>>>>             </fencedevices>
>>>> </cluster>
>>>>
>>>> clustat show:
>>>>
>>>> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member
>>>> Status: Quorate
>>>>
>>>>      Member Name                                                     ID   Status
>>>>      ------ ----                                                     ---- ------
>>>>      ustlvcmsp1954                                                       1 Offline
>>>>      ustlvcmsp1955                                                       2 Online, Local
>>>>      ustlvcmsp1956                                                       3 Online
>>>>      ustlvcmsp1957                                                       4 Offline
>>>>      ustlvcmsp1958                                                       5 Online
>>>>
>>>> I need to make them all online, so I can use fencing for mounting shared disk.
>>>>
>>>> Thanks,
>>>> Vinh
>>>
>>> What about the log entries from the start-up? Did you try the post_join_delay config?
>>>
>>>
>>>> -----Original Message-----
>>>> From: linux-cluster-bounces@xxxxxxxxxx 
>>>> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Digimer
>>>> Sent: Wednesday, January 07, 2015 3:16 PM
>>>> To: linux clustering
>>>> Subject: Re:  needs helps GFS2 on 5 nodes cluster
>>>>
>>>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>>>>
>>>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>>>>
>>>> Also, 6.4 is pretty old, why not upgrade to 6.6?
>>>>
>>>> digimer
>>>>
>>>> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>>>>> Hello Cluster guru,
>>>>>
>>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two 
>>>>> nodes I don't have any issue.
>>>>>
>>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the 
>>>>> other two off line.
>>>>>
>>>>> When I start the one that are off line. Service cman start. I got:
>>>>>
>>>>> [root@ustlvcmspxxx ~]# service cman status
>>>>>
>>>>> corosync is stopped
>>>>>
>>>>> [root@ustlvcmsp1954 ~]# service cman start
>>>>>
>>>>> Starting cluster:
>>>>>
>>>>>         Checking if cluster has been disabled at boot...        [  OK  ]
>>>>>
>>>>>         Checking Network Manager...                             [  OK  ]
>>>>>
>>>>>         Global setup...                                         [  OK  ]
>>>>>
>>>>>         Loading kernel modules...                               [  OK  ]
>>>>>
>>>>>         Mounting configfs...                                    [  OK  ]
>>>>>
>>>>>         Starting cman...                                        [  OK  ]
>>>>>
>>>>> Waiting for quorum... Timed-out waiting for cluster
>>>>>
>>>>>
>>>>> [FAILED]
>>>>>
>>>>> Stopping cluster:
>>>>>
>>>>>         Leaving fence domain...                                 [  OK  ]
>>>>>
>>>>>         Stopping gfs_controld...                                [  OK  ]
>>>>>
>>>>>         Stopping dlm_controld...                                [  OK  ]
>>>>>
>>>>>         Stopping fenced...                                      [  OK  ]
>>>>>
>>>>>         Stopping cman...                                        [  OK  ]
>>>>>
>>>>>         Waiting for corosync to shutdown:                       [  OK  ]
>>>>>
>>>>>         Unloading kernel modules...                             [  OK  ]
>>>>>
>>>>>         Unmounting configfs...                                  [  OK  ]
>>>>>
>>>>> Can you help?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vinh
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Digimer
>>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster@xxxxxxxxxx
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster



[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux