Re: qdisk WITHOUT fencing

Gordan Bobic <gordan@xxxxxxxxxx> · Fri, 18 Jun 2010 11:49:58 +0100

On 06/18/2010 11:28 AM, Jankowski, Chris wrote:

Can you please sort out the (lack of) word-wraps in your email client?

Do you have a better idea? How do you propose to ensure that there
is no resource clash when a node becomes intermittent or half-dead?
How do you prevent it's interference from bringing down the service?
What do you propose? More importantly, how would you propose to handle
this when ensuring consistency is of paramount importance, e.g. when
using a cluster file system?

I believe that SCSI reservation are the key for protection.  One can
form a group of hosts that are allowed to access storage and exclude
those that had their membership revoked. Note that this is a protective
mechanism - the stance is here: "This is ours and we protect it".  A
node that have been ejected cannot do damage anymore.  This is
philosophically opposite approach to fencing, which is: "I'll go out and
shoot everybody whom I consider suspect and I am not going to come back
until I've successfully shot everybody whom I consider suspect."

It isn't opposite philosophically at all. Instead of fencing by powering 
off the offending machine, you are fencing by cutting the machine off 
from the SAN. Logically, the two are identical, but you then also 
potentially need to apply other fencing for, say, network resources. 
I've written a fencing agent before for a managed switch to fence a 
machine by fencing it's switch port. That works as well as power 
fancing, but it isn't at all fundamentally different.

A properly implemented quorum disk is the key for management of the
cluster membership. Based on access to quorum disk one can then
establish who is the member. The nodes ejected are configured to
commit suicide, reboot and try to rejoin the cluster.

If a node crashes, it cannot be expected to remain functional enough to 
commit suicide.

Then, based on membership one can set up SCSI reservations on shared
storage.  This will protect the integrity of the filesystems including
shared cluster filesystem.

See above - the distinction between power a node off or cutting off all 
it's network access is pretty immaterial. It doesn't get you away from 
the fundamental problem that you need a reliable way of preventing the 
failing node from rejoining the cluster.

Note that there is natural affinity between the quorum disk on shared
storage and shared cluster file system on the shared storage. Whoever
has access to the quorum disk has access to shared storage and can
stay as a member. Whoever does not should be ejected. Whether such
node is dead, half-dead or actively looking for mischief is irrelevant,
because it does not have access to storage once SCSI reservations have
been set to exclude it. It won't get anywhere without access to storage.

Sure - but I don't think anyone ever argued that power based fencing is 
mandatory. Brocade switch based fencing from the SAN was supported last 
time I checked the list of supported fencing devices for RHCS.

This is how DEC/Compaq/HP TruCluster V5.x works. It does support shared
cluster filesystem.  In fact, this is the only filesystem that it
supports except for UFS for CDROMS. And it supports shared root.

Shared Root is supported on Linux, in a lot of ways. Open Shared Root is 
one example, and I've even written a set of extensions to make that work 
on GlusterFS. I think it's in the OSR contrib repository.

There is only one password file, one group file, one set of binaries
and libraries all shared in CFS. And it has a rolling upgrade. It
works reliably and there is not a trace of fencing in it.  So, it can
be done.  This is a living proof and it works.

I think we are not agreeing entirely on what "fencing" actually is. And 
you are still talking about solving a problem that isn't hard to solve 
with RHCS - single SAN, at one location. If the machines are in one 
place, fencing isn't a problem. What's difficult is fencing in a 
geographically dispersed setup. I thought this was the main point of 
this thread.

Gordan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster