Re: new cluster acting odd

"Megan ." <nagemnna@xxxxxxxxx> · Mon, 1 Dec 2014 11:56:51 -0500

Thank you for your replies.

The cluster is intended to be 9 nodes, but i haven't finished building
the remaining 2.  Our production cluster is expected to be similar in
size.  What tuning should I be looking at?

Here is a link to our config.  http://pastebin.com/LUHM8GQR  I had to
remove IP addresses.

I tried the method of (echo c > /proc/sysrq-trigger) to crash a node,
the cluster kept seeing it as online and never fenced it, yet i could
no longer ssh to the node.  I did this on a physical and VM box with
the same result.  I had to fence_node node to get it to reboot, but it
came up split brained (thinking it was the only one online). Now that
node has cman down and the rest of the cluster sees it as still
online.

I thought fencing was working because i'm able to do fence_node node
and see the box reboot and come back online.  I did have to get the FC
version of the fence_agents because of an issue with the idrac agent
not working properly.  We are running fence-agents-3.1.6-1.fc14.x86_64

fence_tool dump worked on one of my nodes, but it is just hanging on the rest.

[root@map1-uat ~]# fence_tool dump
1417448610 logging mode 3 syslog f 160 p 6 logfile p 6
/var/log/cluster/fenced.log
1417448610 fenced 3.0.12.1 started
1417448610 connected to dbus :1.12
1417448610 cluster node 1 added seq 89048
1417448610 cluster node 2 added seq 89048
1417448610 cluster node 3 added seq 89048
1417448610 cluster node 4 added seq 89048
1417448610 cluster node 5 added seq 89048
1417448610 cluster node 6 added seq 89048
1417448610 cluster node 8 added seq 89048
1417448610 our_nodeid 4 our_name map1-uat.project.domain.com
1417448611 logging mode 3 syslog f 160 p 6 logfile p 6
/var/log/cluster/fenced.log
1417448611 logfile cur mode 100644
1417448611 cpg_join fenced:daemon ...
1417448621 daemon cpg_join error retrying
1417448631 daemon cpg_join error retrying
1417448641 daemon cpg_join error retrying
1417448651 daemon cpg_join error retrying
1417448661 daemon cpg_join error retrying
1417448671 daemon cpg_join error retrying
1417448681 daemon cpg_join error retrying
1417448691 daemon cpg_join error retrying
.
.
.

[root@map1-uat ~]# clustat
Cluster Status for gibsuat @ Mon Dec  1 16:51:49 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 archive1-uat.project.domain.com                                1 Online
 admin1-uat.project.domain.com                                  2 Online
 mgmt1-uat.project.domain.com                                   3 Online
 map1-uat.project.domain.com                                    4 Online, Local
 map2-uat.project.domain.com                                    5 Online
 cache1-uat.project.domain.com                                 6 Online
 data1-uat.project.domain.com                                   8 Online

The  /var/log/cluster/fenced.log on the nodes is saying Dec 01
16:02:34 fenced cpg_join error retrying every 10th of a second.

Obviously having some major issues.  These are fresh boxes, no other
services right now other then ones related to the cluster.

I've also experimented with the  <cman transport="udpu"/> to disable
multicast to see if that helped but it doesn't seem to make a
difference with the node stability.

Is there a document or some sort of reference that I can give the
network folks on how the switches should be configured?  I read stuff
on boards about IGMP snooping, but I couldn't find anything from
RedHat to hand them.

On Mon, Dec 1, 2014 at 10:57 AM, Digimer <lists@xxxxxxxxxx> wrote:
> On 01/12/14 09:16 AM, Megan . wrote:
>>
>> Good Day,
>>
>> I'm fairly new to the cluster world so i apologize in advance for
>> silly questions.  Thank you for any help.
>
>
> No pre-existing knowledge required, no need to apologize. :)
>
>> We decided to use this cluster solution in order to share GFS2 mounts
>> across servers.  We have a 7 node cluster that is newly setup, but
>> acting oddly.  It has 3 vmware guest hosts and 4 physical hosts (dells
>> with Idracs).  They are all running Centos 6.6.  I have fencing
>> working (I'm able to do fence_node node and it will fence with
>> success).  I do not have the gfs2 mounts in the cluster yet.
>
>
> Very glad you have fencing, that's a common early mistake.
>
> 7-node cluster is actually pretty large and is around the upper-end before
> tuning starts to become fairly important.
>
>> When I don't touch the servers, my cluster looks perfect with all
>> nodes online. But when I start testing fencing, I have an odd problem
>> where i end up with split brain between some of the nodes.  They won't
>> seem to automatically fence each other when it gets like this.
>
>
> If you get a split-brain, something is seriously broken. Either the fencing
> isn't properly working (getting a false success from the agent, for
> example). Can you pastebin your cluster.conf (or fpaste or something where
> tabs are preserved to make it more readible)?
>
>> in the  corosync.log for the node that gets split out i see the totem
>> chatter, but it seems confused and just keeps doing the below over and
>> over:
>>
>> Dec 01 12:39:15 corosync [TOTEM ] Retransmit List: 22 24 25 26 27 28 29 2a
>> 2b 2c
>>
>> Dec 01 12:39:17 corosync [TOTEM ] Retransmit List: 22 24 25 26 27 28 29 2a
>> 2b 2c
>>
>> Dec 01 12:39:19 corosync [TOTEM ] Retransmit List: 22 24 25 26 27 28 29 2a
>> 2b 2c
>>
>> Dec 01 12:39:39 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
>>
>> Dec 01 12:39:39 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
>> 21 23 24 25 26 27 28 29 2a 2b 32
>> ..
>> ..
>> ..
>> Dec 01 12:54:49 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
>> 1d 1f 20 21 22 23 24 25 26 27 2e 30 31 32 37 38 39 3a 3b 3c
>>
>> Dec 01 12:54:50 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
>> 1d 1f 20 21 22 23 24 25 26 27 2e 30 31 32 37 38 39 3a 3b 3c
>>
>> Dec 01 12:54:50 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
>> 1d 1f 20 21 22 23 24 25 26 27 2e 30 31 32 37 38 39 3a 3b 3c
>
>
> This is a sign of network congestion. This is the node saying "I lost some
> (corosync) data, please retransmit".
>
>> I can manually fence it, and it still comes online with the same
>> issue.  I end up having to take the whole cluster down, sometimes
>> forcing reboot on some nodes, then brining it back up.  Its takes a
>> good part of the day just to bring the whole cluster online again.
>
>
> Something fence related is not working.
>
>> I used ccs -h node --sync --activate and double checked to make sure
>> they are all using the same version of the cluster.conf file.
>
>
> You can also use 'cman_tool version'.
>
>> Once issue I did notice, is that when one of the vmware hosts is
>> rebooted, the time comes off slitty skewed (6 seconds) but i thought i
>> read somewhere that a skew that minor shouldn't impact the cluster.
>
>
> Iirc, before RHEL 6.2, this was a problem. Now though, it shouldn't be. I am
> more curious what might be underlying the skew, rather than the skew itself
> being a concern.
>
>
>> We have multicast enabled on the interfaces
>>
>>            UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1
>> and we have been told by our network team that IGMP snooping is disabled.
>>
>> With tcpdump I can see the multi-cast traffic chatter.
>>
>> Right now:
>>
>> [root@data1-uat ~]# clustat
>> Cluster Status for projectuat @ Mon Dec  1 13:56:39 2014
>> Member Status: Quorate
>>
>>   Member Name                                                     ID
>> Status
>>   ------ ----                                                     ----
>> ------
>>   archive1-uat.domain.com                                1 Online
>>   admin1-uat.domain.com                                  2 Online
>>   mgmt1-uat.domain.com                                   3 Online
>>   map1-uat.domain.com                                    4 Online
>>   map2-uat.domain.com                                    5 Online
>>   cache1-uat.domain.com                                  6 Online
>>   data1-uat.domain.com                                   8 Online, Local
>>
>>
>>
>> ** Has itself ass online **
>> [root@map1-uat ~]# clustat
>> Cluster Status for projectuat @ Mon Dec  1 13:57:07 2014
>> Member Status: Quorate
>>
>>   Member Name                                                     ID
>> Status
>>   ------ ----                                                     ----
>> ------
>>   archive1-uat.domain.com                                1 Online
>>   admin1-uat.domain.com                                  2 Online
>>   mgmt1-uat.domain.com                                   3 Online
>>   map1-uat.domain.com                                    4 Offline, Local
>>   map2-uat.domain.com                                    5 Online
>>   cache1-uat.domain.com                                  6 Online
>>   data1-uat.domain.com                                   8 Online
>
>
> That is really, really odd. I think we'll need one of the red hat folks to
> chime in.
>
>
>> [root@cache1-uat ~]# clustat
>> Cluster Status for projectuat @ Mon Dec  1 13:57:39 2014
>> Member Status: Quorate
>>
>>   Member Name                                                     ID
>> Status
>>   ------ ----                                                     ----
>> ------
>>   archive1-uat.domain.com                                1 Online
>>   admin1-uat.domain.com                                  2 Online
>>   mgmt1-uat.domain.com                                   3 Online
>>   map1-uat.domain.com                                    4 Online
>>   map2-uat.domain.com                                    5 Online
>>   cache1-uat.domain.com                                  6 Offline, Local
>>   data1-uat.domain.com                                   8 Online
>>
>>
>>
>> [root@mgmt1-uat ~]# clustat
>> Cluster Status for projectuat @ Mon Dec  1 13:58:04 2014
>> Member Status: Inquorate
>>
>>   Member Name                                                     ID
>> Status
>>   ------ ----                                                     ----
>> ------
>>   archive1-uat.domain.com                                1 Offline
>>   admin1-uat.domain.com                                  2 Offline
>>   mgmt1-uat.domain.com                                   3 Online, Local
>>   map1-uat.domain.com                                    4 Offline
>>   map2-uat.domain.com                                    5 Offline
>>   cache1-uat.domain.com                                  6 Offline
>>   data1-uat.domain.com                                   8 Offline
>>
>>
>> cman-3.0.12.1-68.el6.x86_64
>>
>>
>> [root@data1-uat ~]# cat /etc/cluster/cluster.conf
>> <?xml version="1.0"?>
>> <cluster config_version="66" name="projectuat">
>> <clusternodes>
>> <clusternode name="admin1-uat.domain.com" nodeid="2">
>> <fence>
>> <method name="fenceadmin1uat">
>> <device name="vcappliancesoap" port="admin1-uat" ssl="on"
>> uuid="421df3c4-a686-9222-366e-9a67b25f62b2"/>
>> </method>
>> </fence>
>> </clusternode>
>> <clusternode name="mgmt1-uat.domain.com" nodeid="3">
>> <fence>
>> <method name="fenceadmin1uat">
>> <device name="vcappliancesoap" port="mgmt1-uat" ssl="on"
>> uuid="421d5ff5-66fa-5703-66d3-97f845cf8239"/>
>> </method>
>> </fence>
>> </clusternode>
>> <clusternode name="map1-uat.domain.com" nodeid="4">
>> <fence>
>> <method name="fencemap1uat">
>> <device name="idracmap1uat"/>
>> </method>
>> </fence>
>> </clusternode>
>> <clusternode name="map2-uat.domain.com" nodeid="5">
>> <fence>
>> <method name="fencemap2uat">
>> <device name="idracmap2uat"/>
>> </method>
>> </fence>
>> </clusternode>
>> <clusternode name="cache1-uat.domain.com" nodeid="6">
>> <fence>
>> <method name="fencecache1uat">
>> <device name="idraccache1uat"/>
>> </method>
>> </fence>
>> </clusternode>
>> <clusternode name="data1-uat.domain.com" nodeid="8">
>> <fence>
>> <method name="fencedata1uat">
>> <device name="idracdata1uat"/>
>> </method>
>> </fence>
>> </clusternode>
>> <clusternode name="archive1-uat.domain.com" nodeid="1">
>> <fence>
>> <method name="fenceadmin1uat">
>> <device name="vcappliancesoap" port="archive1-uat" ssl="on"
>> uuid="421d16b2-3ed0-0b9b-d530-0b151d81d24e"/>
>> </method>
>> </fence>
>> </clusternode>
>> </clusternodes>
>> <fencedevices>
>> <fencedevice agent="fence_vmware_soap" ipaddr="x.x.x.130"
>> login="fenceuat" login_timeout="10" name="vcappliancesoap"
>> passwd_script="/etc/cluster/forfencing.sh" power_timeout="10"
>> power_wait="30" retry_on="3" shell_timeout="10" ssl="1"/>
>> <fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
>> ipaddr="x.x.x.47" login="fenceuat" name="idracdata1uat"
>> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
>> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
>> <fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
>> ipaddr="x.x.x.48" login="fenceuat" name="idracdata2uat"
>> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
>> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
>> <fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
>> ipaddr="x.x.x.82" login="fenceuat" name="idracmap1uat"
>> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
>> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
>> <fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
>> ipaddr="x.x.x.96" login="fenceuat" name="idracmap2uat"
>> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
>> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
>> <fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
>> ipaddr="x.x.x.83" login="fenceuat" name="idraccache1uat"
>> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
>> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
>> <fencedevice agent="fence_drac5" cmd_prompt="admin1-&gt;"
>> ipaddr="x.x.x.97" login="fenceuat" name="idraccache2uat"
>> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
>> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
>> </fencedevices>
>> </cluster>
>
>
> -ENOPARSE
>
> My recommendation would be to schedule a maintenance window and then stop
> everything except cman (no rgmanager, no gfs2, etc). Then methodically test
> crashing all nodes (I like 'echo c > /proc/sysrq-trigger) and verify they
> are fenced and then recover properly. It's worth disabling cman and
> rgmanager from starting at boot (period, but particularly for this test).
>
> If you can reliably (and repeatedly) crash -> fence -> rejoin, then I'd
> start loading back services and re-trying. If the problem reappears only
> under load, then that's an indication of the problem, too.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster