RE: Two-node clusters using GFS and shared storage

"Robert Gil" <Robert.Gil@xxxxxxxxxxxxxx> · Fri, 13 Jul 2007 12:28:43 -0400

Redhat recommends that you use hardware based fencing. Manual fencing causes a lot of issues. Its fine in dev but not for production. You manually need to fence the device and remove it from the cluster when you remove the network cable. If you have some kind of power fencing available you should use that. It will solve those problems. 

Robert Gil
Linux Systems Administrator
American Home Mortgage

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of José Miguel Parrella Romero
Sent: Friday, July 13, 2007 12:19 PM
To: linux-cluster@xxxxxxxxxx
Subject:  Two-node clusters using GFS and shared storage

Greetings,

I've been trying to setup a two-node cluster using a shared SAN (via Fibre Channel) and GFS. I've previously tried OCFS2, and I don't want to use NFS yet. The cluster must be an active-active one, and it runs on
Itanium2 machines with Debian 4.0. I'm using cman 1.03.00

I've setup a cluster using Red Hat tools, and my /etc/cluster/cluster.conf looks like:

-- my cluster.conf --

<?xml version="1.0"?>
<cluster name="correo" config_version="1">

<cman two_node="1" expected_votes="1">
</cman>

<clusternodes>

<clusternode name="node1" votes="1">
</clusternode>

<clusternode name="node2" votes="1">
</clusternode>

</clusternodes>

</cluster>

-- end my cluster.conf --

Note that I've removed entries related to fencing, but I previously had a 'manual' fencing method. So I've an LVM volume which contains a GFS filesystem, and I'm able to start ccsd, cman, fenced, clvmd and all the other related applications.

Syslog reports that the cluster is quorate, and I'm able to mount the filesystem in both of my nodes. They need to write to the shared storage in an active-active fashion.

I expect that removing the network cable in node1 would do the following:

a) node1 would be disabled (right, it doesn't have a network cable)
b) node2 would notice node1 is not there and will keep writing to the shared storage
c) Eventually node1 will come back, and node2 will notice it, so it will hopefully start writing again

And this it what happens when I unplug the network cable:

a) node1 is disabled (no connectivity)
b) node2 is also disabled! (trying to write to /home and /var/mail stalls the machine, and then logins and other processes are stalled)
c) Plugging the cable back does nothing (both machines are hanged now, so I need to reboot them)

I'm probably missing something, since this solution using OCFS2 also has the same problem! Our last-resort solution is active-active NFS using Heartbeat, but then we wouldn't be writing to the SAN through FC (2Gbps) but through Ethernet (1Gbps) since we don't have any other media around ATM.

Is this a configuration related problem? Or is this a design feature in  both GFS/OCFS2? Or maybe I'm just missing the whole picture...

Thank you very much for any advice,
Jose

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster