Re: new cluster acting odd

Digimer <lists@xxxxxxxxxx> · Mon, 01 Dec 2014 13:24:00 -0500

On 01/12/14 01:03 PM, Megan . wrote:
We have 11 10-20TB GFS2 mounts that I need to share across all nodes.
Its the only reason we went with the cluster solution.  I don't know
how we could split it up into different smaller clusters.

I would do this, personally;

2-Node cluster; DRBD (on top of local disks or a pair of SANs, one per 
node), exported over NFS and configured in a simple single-primary 
(master/slave) configuration with a floating IP.

GFS2, like any clustered filesystem, requires cluster locking. This 
locking comes with a non-trivial overhead. Exporting NFS allows you to 
avoid this bottle-neck and with a simple 2-node cluster behind the 
scenes, you maintain full HA.

In HA, nothing is more important than simplicity. Said another way;

"A cluster isn't beautiful when there is nothing left to add. It is 
beautiful when there is nothing left to take away."

On Mon, Dec 1, 2014 at 12:14 PM, Digimer <lists@xxxxxxxxxx> wrote:
On 01/12/14 11:56 AM, Megan . wrote:

Thank you for your replies.

The cluster is intended to be 9 nodes, but i haven't finished building
the remaining 2.  Our production cluster is expected to be similar in
size.  What tuning should I be looking at?

Here is a link to our config.  http://pastebin.com/LUHM8GQR  I had to
remove IP addresses.

Can you simplify those fencedevice definitions? I would wonder if the set
timeouts could be part of the problem. Always start with the simplest
possible configurations and only add options in response to actual issues
discovered in testing.

I can try to simplify.  I had the longer timeouts because what I saw
happening on the physical boxes, was the box would be on its way
down/up and the fence command would fail, but the box actually did
come back online.  The physicals take 10-15 minutes to reboot and i
wasn't sure how to handle timeout issues, so i made the timeouts a bit
extreme for testing. I'll try to make the config more vanilla for
troubleshooting.

I'm not really sure why the state of the node should impact the fence 
action in any way. Fencing is supposed to work, regardless of the state 
of the target.

Fencing works like this (with a default config, on most fence agents);

1. Force off
2. Verify off
3. Try to boot, don't care if it succeeds.

So once the node is confirmed off by the agent, the fence is considered 
a success. How long (if at all) it takes for the node to reboot does not 
factor in.

I tried the method of (echo c > /proc/sysrq-trigger) to crash a node,
the cluster kept seeing it as online and never fenced it, yet i could
no longer ssh to the node.  I did this on a physical and VM box with
the same result.  I had to fence_node node to get it to reboot, but it
came up split brained (thinking it was the only one online). Now that
node has cman down and the rest of the cluster sees it as still
online.

Then corosync failed to detect the fault. That is a sign, to me, of a
fundamental network or configuration issue. Corosync should have shown
messages about a node being lost and reconfiguring. If that didn't happen,
then you're not even up to the point where fencing factors in.

Did you configure corosync.conf? When it came up, did it think it was
quorate or inquorate?

corosync.conf didn't work since it seems the RedHat HA Cluster doesn't
use that file.  http://people.redhat.com/ccaulfie/docs/CmanYinYang.pdf
  I tried it since we wanted to try to put the multicast traffic on a
different bond/vlan but we figured out the file isn't used.

Right, I wanted to make sure that, if you had tried, you've since 
removed the corosync.conf entirely. Corosync is fully controlled by the 
cman cluster.conf file.

I thought fencing was working because i'm able to do fence_node node
and see the box reboot and come back online.  I did have to get the FC
version of the fence_agents because of an issue with the idrac agent
not working properly.  We are running fence-agents-3.1.6-1.fc14.x86_64

That tells you that the configuration of the fence agents is working, but it
doesn't test failure detection. You can use the 'fence_check' tool to see if
the cluster can talk to everything, but in the end, the only useful test is
to simulate an actual crash.

Wait; 'fc14' ?! What OS are you using?

We are Centos 6.6.  I went with the fedora core agents because of this
exact issue http://forum.proxmox.com/threads/12311-Proxmox-HA-fencing-and-Dell-iDrac7
  I read that it was fixed in the next version, which i could only find
for FC.

It would be *much* better to file a bug report 
(https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%206) 
-> Version: 6.6 -> Component: fence-agents

Mixing RPMs from other OSes is not a good idea at all.

fence_tool dump worked on one of my nodes, but it is just hanging on the
rest.

[root@map1-uat ~]# fence_tool dump
1417448610 logging mode 3 syslog f 160 p 6 logfile p 6
/var/log/cluster/fenced.log
1417448610 fenced 3.0.12.1 started
1417448610 connected to dbus :1.12
1417448610 cluster node 1 added seq 89048
1417448610 cluster node 2 added seq 89048
1417448610 cluster node 3 added seq 89048
1417448610 cluster node 4 added seq 89048
1417448610 cluster node 5 added seq 89048
1417448610 cluster node 6 added seq 89048
1417448610 cluster node 8 added seq 89048
1417448610 our_nodeid 4 our_name map1-uat.project.domain.com
1417448611 logging mode 3 syslog f 160 p 6 logfile p 6
/var/log/cluster/fenced.log
1417448611 logfile cur mode 100644
1417448611 cpg_join fenced:daemon ...
1417448621 daemon cpg_join error retrying
1417448631 daemon cpg_join error retrying
1417448641 daemon cpg_join error retrying
1417448651 daemon cpg_join error retrying
1417448661 daemon cpg_join error retrying
1417448671 daemon cpg_join error retrying
1417448681 daemon cpg_join error retrying
1417448691 daemon cpg_join error retrying
.
.
.

[root@map1-uat ~]# clustat
Cluster Status for gibsuat @ Mon Dec  1 16:51:49 2014
Member Status: Quorate

   Member Name                                                     ID
Status
   ------ ----                                                     ----
------
   archive1-uat.project.domain.com                                1 Online
   admin1-uat.project.domain.com                                  2 Online
   mgmt1-uat.project.domain.com                                   3 Online
   map1-uat.project.domain.com                                    4 Online,
Local
   map2-uat.project.domain.com                                    5 Online
   cache1-uat.project.domain.com                                 6 Online
   data1-uat.project.domain.com                                   8 Online

The  /var/log/cluster/fenced.log on the nodes is saying Dec 01
16:02:34 fenced cpg_join error retrying every 10th of a second.

Obviously having some major issues.  These are fresh boxes, no other
services right now other then ones related to the cluster.

What OS/version?

I've also experimented with the  <cman transport="udpu"/> to disable
multicast to see if that helped but it doesn't seem to make a
difference with the node stability.

Very bad idea with >2~3 node clusters. The overhead will be far too great
for a 7~9 node cluster.

Is there a document or some sort of reference that I can give the
network folks on how the switches should be configured?  I read stuff
on boards about IGMP snooping, but I couldn't find anything from
RedHat to hand them.

I have this:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Six_Network_Interfaces.2C_Seriously.3F

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Network_Switches

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Network_Security_Considerations

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Network

There are comments in there about multicast, etc.

Thank you for the links.  I will review them with our network folks,
hopefully it will help us sort out some of our issues.

I will use the fence_check tool to see if i can troubleshoot the fencing.

Thank you very much for all of your suggestions.

Happy to help. :)

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster