Re: stonith vs. quorums

Digimer <lists@xxxxxxxxxx> · Thu, 26 Jul 2012 11:56:50 -0400

You have the concept pretty much right.

That said, fencing (aka stonith) is always a good idea. The worst thing 
in a cluster is to have a node in an unknown state. If you make 
assumptions, they could prove dangerously wrong.

In my clusters, I always put DRBD traffic on it's own link. This also 
helps keep it's data from saturating the network used by other things.

for example, say you have eth0 facing your LAN. Here you would have the 
fence devices. Add eth1 and use it's IP in the rX.res resource 
definition file. So now, if eth1 drops, the fence call can go out over 
eth0. Make sense?

As for the mechanics of fencing; If you have a cman cluster, you can use 
the obliterate-peer.sh script in DRBD's global config. Alternatively, if 
you have a pacemaker cluster, you can use the 'crm-fence-peer.sh' script.

In any case, please always set up proper fencing. With shared storage, 
it's the only safe option.

I cover an example of why this is recommended here:

https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

And how to setup DRBD for hooking into a cluster's fencing here:

https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Hooking_DRBD_Into_The_Cluster.27s_Fencing

In case it helps, this is how I setup my networking to support the 
cluster and DRBD in a redundant manner (using bonds);

https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Network

Cheers

On 07/26/2012 09:29 AM, Antonis Christofides wrote:
Hello,

let me see if I've understood correctly. Assume the following cluster:

     Internet              Internet
       |                     |
       |                     |
      node1 -------------- node2
  (drbd primary)         (drbd secondary)

Now the network is partitioned. node2 can't communicate with node1. So
it thinks node1 is down. So it makes itself a drbd primary, and clients
are now writing data to node2. But node1 is not really down, it's just
disconnected from node2. Some clients are still accessing node1 and
writing to it. Result: chaos. So we use stonith to guard against this
issue, but stonith requires a second, independent communication channel
to node1. I have no experience on this but I guess "independent" is a
big word.

Instead:

     Internet              Internet
       |                     |
       |                     |
      node1 -------------- node2
  (drbd primary)         (drbd secondary)
       |                     |
       |                     |
       +-------- node3 ------+

The network is partitioned; say node1 is disconnected from both node2
and node3. Node1, seeing that it no longer has quorum (it is alone),
switches itself to drbd secondary. node2, seeing that node1 is no longer
there and that together with node3 it has a quorum, makes itself a drbd
primary.

This looks to me simpler and easier to achieve (but again, I have no
experience). Isn't it a valid alternative solution, that makes stonith
unnecessary? I'm asking because in some places in the Pacemaker or drbd
documentation it says "don't do this without stonith!"; it doesn't say
"don't do this without either stonith or quorum!", and I was therefore
wondering whether I've understood something wrong.
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

--
Digimer
Papers and Projects: https://alteeve.com
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss