[Linux-cluster] Re: GFS without fence system?

David Teigland <teigland@xxxxxxxxxx> · Mon, 7 Mar 2005 23:51:32 +0800

On Mon, Mar 07, 2005 at 09:56:05PM +0800, ?????? wrote:
> Hi,

> if I have a cluster in which only one node do write operations to
> the storage at the same time, can I use GFS without fence system?

no

> Or use the fence system without hardware (such as FC switch, power
> switch) involved, 

You can use the fence_manual agent, but this isn't recommended for
production use because it requires manual intervention whenever a node
fails.

> do the fence operations automaticly immediately after the heartbeat
> detects a bad node.

Usually yes.  You can set post_fail_delay which is the number of seconds
the fence daemon will wait until fencing the failed node.  If the failed
node restarts and rejoins the cluster within that time, it won't be
fenced.  (Note: if you set post_fail_delay to -1, then fenced will wait
forever for the node to rejoin the cluster and no actual fencing will
occur.)

Also, if the cluster loses quorum fencing will be delayed until quorum is
regained.

> In reverse, at what condition that data corruption would happen without
> fence in GFS system? Two nodes write at the same LV|GFS|file|block
> (which?) at the same time?

Fencing doesn't prevent data corruption (contents of files).
Fencing does prevent file system corruption (gfs metadata) in this case:

1. cluster members are node1, node2, node3; all are in the fence domain
   and have gfs mounted

2. node1 becomes unresponsive to node2/node3

3. node2/node3 think node1 is dead

4. node2 replays node1's journal

5. node1 writes to gfs -- this can corrupt the fs because it happens
   after its journal is replayed

Node1 may have hung for a while and then woken up.  Or, node1 may simply
have lost its network connection, in which case node2/node3 think it's
dead, but gfs on node1 may still be writing to storage using the locks it
already holds.

We can prevent the gfs corruption above in one of two ways:

i) Using hardware fencing to prevent step 5 from ever happening
successfully after step 4.

ii) Only allowing step 4 to happen after we're certain node1 has been
cleanly reset (one way I mentioned above is noticing that node1 has
rejoined the cluster).

Very often method ii requires manual intervention, so method i is
necessary for serious gfs usage.

PS.  There is a third way to prevent gfs corruption that we don't use,
although the fs code is in place to do it.  This third method could only
be applied in the situation where node1 becomes accessible again after
step 3.  

iii) Only allowing step 4 to happen after we're certain that gfs on node1
will not write any more to the device and has no outstanding writes to the
device.  

We haven't spent time on this method because most often it would not apply
and methods i/ii would still be necessary.

-- 
Dave Teigland  <teigland@xxxxxxxxxx>