Re: Starter Cluster / GFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/11/2010 04:48 PM, Digimer wrote:
Clustered storage *requires* fencing. To not use fencing is like driving
tired; It's just a matter of time before something bad happens. That
said, I should have been more clear in specifying the requirement for
fencing.

Now that said, the fencing shouldn't be needed at the SAN side, though
that works fine as well.

The default fencing action, last time I checked, is reboot. Consider the
use case where you have a network failure and separate networks for
various things, and you lose connectivity between the nodes but they
both still have access to the SAN. One node gets fenced, reboots, comes
up and connects to the SAN. It connects to the quorum device and has
quorum without the other nodes, and mounts the file systems and starts
writing - while all the other nodes that have become partitioned off do
the same thing. Unless you can fence the nodes from the SAN side, quorum
device having a 50% weight is a recipe for disaster.

Agreed, and that is one of the major benefits of qdisk. It prevents a
50/50 split. Regardless though, say you have an eight node cluster and
it partitions evenly with no qdisk to tie break. In that case, neither
partition has>50% of the votes, so neither should have quorum. In turn,
neither should touch the SAN.

Exactly - qdisk is a tie-breaker. The point I was responding to was the one where somebody suggested giving qdisk a 50% vote weight (i.e. needs only qdisk + 1 node for quorum), which is IMO not a sane way to do it.

I'm well aware of how fencing works, but you overlooked one major
failure mode that is essentially guaranteed to hose your data if you set
up the quorum device to have 50% of the votes.

See above. 50% is not quorum.

No, but 50% + 1 node is quorum, and I'm saying that having qdisk (50%) + 1 node = quorum is not the way to go.

With SAN-side fencing, a fence is in the form of a logic disconnection
from the storage network. This has no inherent mechanism for recovery,
so the sysadmin will have to manually recover the node(s). For this
reason, I do not prefer it.

Then don't use a quorum device with more than an equal weight to the
individual nodes.

How does the number of nodes relate, in this case, to the SAN-side fence
recovery?

It doesn't directly. I'm saying that the only way that giving qdisk 50% of the vote toward quorum is if your fencing is done by the SAN itself. Otherwise any 1 node that comes up has quorum, regardless of how many other are down, which in turn leads to multiple nodes being individually quorate when the connect to the SAN. This situation will trash the shared file system.

Gordan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster


[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux