Re: rhcs + gfs performance issues

Gordan Bobic <gordan@xxxxxxxxxx> · Sat, 04 Oct 2008 18:32:58 +0100

Doug Tucker wrote:
In your cluster.conf, make sure in the

<cluternode name="node1c"....

section is pointing at a private crossover IP of the node. Say you have 
2nd dedicated Gb interface for the clustering, assign it address, say 
10.0.0.1, and in the hosts file, have something like

10.0.0.1 node1c
10.0.0.2 node2c

That way each node in the cluster is referred to by it's cluster 
interface name, and thus the cluster communication will go over that 
dedicated interface.

I'm not sure I understand this correctly, please bear with me, are you
saying the communication runs over the fenced interface?

No, over a dedicated, separate interface.

Or that the
node name should reference a seperate nic that is private, and the
exported virtual ip to the clients is done over the public interface?

That's the one.

I'm confused, I thought that definition had to be the same as the
hostname of the box?

No. The floating IPs will get assigned to whatever interface has the IP 
on that subnet. The cluster/DLM comms interface is inferred by the node 
name.

Here is what is in my conf file for reference:

 <clusternode name="engrfs1.seas.smu.edu" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device modulename=""
name="engrfs1drac"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="engrfs2.seas.smu.edu" nodeid="2"
votes="1">
                        <fence>
                                <method name="1">
                                        <device modulename=""
name="engrfs2drac"/>
                                </method>
                        </fence>

Where as engrfs1 and 2 are the actual hostnames of the boxes.

Add another NIC in, give it a private IP/subnet, and put it in the hosts 
file on both nodes as something like engrfs1-cluster.seas.smu.edu, and 
put that in the clusternode name entry.

The fail-over resources (typically client-side IPs) remain as they are 
on the client-side subnet.

It sounds like you are seeing write contention. Make sure you mount 
everything with noatime,nodiratime,noquota, both from the GFS and from 
the NFS clients' side. Otherwise ever read will also require a write, 
and that'll kill any hope of getting decent performance out of the system.
>
Already mounted noatime, will add nodiratime.  Can't do noquota, we
implement quotas for ever users here (5000 or so), and did so on the old
file server.

I'm guessing the old server was standalone, rather than clustered?
>
No, clustered, as I assume you realized below, just making sure it's
clear.

OK, noted.

I see, so you had two servers in a load-sharing write-write 
configuration before, too?
>
Certainly were capable of such.  However here, as we did there, we set
it up in more of a failover mode.  We export a virtual ip attached to
the nfs export, and all clients mount the vip, so whichever machine has
the vip at a given time is "master" and gets all the traffic.  The only
exception to this is the backups that run at night, we do on the
"secondary" machine directly, rather than using the vip.  And the
secondary is only there in the event of a failure to node1, when node1
comes back online, it is set up to fail back to node1.

OK, that should be fine, although you may find there's less of a 
performance hit if you do the backup from the master node, too, as 
that'll already have the locks on all the files.

If you set the nodes up in a fail-over configuration, and server all the 
traffic from the primary node, you may see the performance improve due 
to locks not being bounced around all the time, they'll get set on the 
master node and stay there until the master node fails and it's floating 
IP gets migrated to the other node.
>
As explained above, exactly how it is set up.  Old file server the same
way.  We're basically completely scratching our heads in disbelief here
to a large degree.  No if/ands/buts about it, hardware wise, we have
500% more box than we used to have.  Configuration architecture is
virtually identical.  Which leaves us with the software, which leaves us
with only 2 conclusions we can come up with:

1)  Tru64 and TruCluster with Advfs from 7 years ago is simply that much
more robust and mature than RHES4 and CS/GFS and therefore tremendously
outperforms it...or

RHEL4 is quite old. It's been a while since I used it for clustering. 
RHEL5 has yielded considerably better performance in my experience.

2)  We have this badly configured.

There isn't all that much to tune on RHEL4 cluster-wise, most of the 
tweakability has been added more recently than I've last used it. I'd 
say RHEL5 is certainly worth trying. The problem you are having may just 
go away.

Gordan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster