On 18/06/2009, at 9:42 PM, Gordan Bobic wrote:
On Thu, 18 Jun 2009 17:11:42 +0530, Brahadambal Srinivasan
<brahadambal@xxxxxxxxx> wrote:
2. Fencing - any special methods to fence ?
Just be aware that if your site interconnect goes down, you'll end
up with
a hung cluster, since the nodes will disconnect and be unable to
fence each
other. You could offset that by having separate cluster and fencing
interconnects, but you would also need to look into quorum - you
need n/2+1
nodes for quorum, so to make this work sensibly you'd need at least
three
sites - otherwise if you lose the bigger site you lose the whole
cluster
anyway.
This question came up last week as well so I have been thinking about
the options here. Gordan's suggestion of three sites is a good one but
may not be feasible for some.
If you are using replicated SAN LUN(s) for your shared storage, the
LUN is only ever going to be active at one site. So, if you lose
connectivity between sites you obviously want the cluster to remain
operational at the site with the active storage LUN. I can imagine a
cross-site accessible qdisk *almost* solving this problem.
The remaining issue, as I see it, is that if your network connectivity
is lost the cluster will pause all services until it has successfully
removed the failed nodes -- if it can't fence these nodes due to the
lost network connectivity, you may end up with a site that effectively
has quorum but all services are still hung. This sort of issue would
especially arise if, for example, you lost ethernet connectivity but
not FC/storage connectivity - the nodes at the remote site would still
be able to access the qdisk. Perhaps a combination of power fencing
(via ethernet) + storage fencing (on the local side of the SAN) could
make this a workable solution?
Regards,
Tom
--
Tom Lanyon
Senior Systems Engineer
NetSpot Pty Ltd
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster