Andrew Lacey wrote:
Very informative post...thanks! The scenario you mentioned with a dead
switch port (or a single unplugged network cable, or whatever) is
something I had thought about, and I considered it to be a strike against
using a crossover cable.
How does that follow? With a switch in the middle your points of failure
are:
cable, switch, cable
With just a crossover cable (actually, it doesn't have to be crossover -
99% of NICs made in the past few years auto-detect and auto-negotiate
whether they need to cross-over or not, so you can just use a
straight-through cable - but that's getting off topic), you only have a
single cable as a point of failure. That is certainly better than the
alternative.
But, this "monitor_link" sounds like it might be
exactly what I've been looking for. I'll research that and see what I can
find.
You don't need that on your cluster interface though. If the NIC or
cable die, cluster will lose the connection to the other node and fence
it. If you have something like iLO on multiple interfaces, you can
specify multiple fencing devices, to ensure that you manage to fence the
other node, regardless of which interface fails. But the crossover
interface connecting the nodes is arguably the most reliable part of
your 2-node cluster because it has the fewest components.
You asked in your other post how I can tell the difference between a
network outage that should cause a fence and one that shouldn't. What I
wanted to do was set it up so that a node that can't reach the switch will
never try to fence the other node. That way, if the switch is down and
nobody can reach it, then nobody will fence. If there is a single port
failure and one node can still reach the switch, then it will fence the
other node and take over the services.
Is your switch managed? If so, you can use this as a fencing device
simply have a node disable the other node's port. That way any
subsequent attempts by the other node, to fence or do anything else,
will not get anywhere. You may need to write your own fencing agent for
that, though. I asked for fencing agent API in a post earlier, and there
appears to be no conclusive documentation for this. I've been meaning to
implement a fencing agent for exactly this sort of thing (fencing by
disabling the switch port) on a 3Com switch.
Gordan
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster