Cluster node without access to all resources - trouble

Janne Peltonen <janne.peltonen@xxxxxxxxxxx> · Thu, 28 Jun 2007 18:45:37 +0300

Hi.

I'm running a five node cluster. Four of the nodes run services that
need access to a SAN, but the fifth doesn't. (The fifth node belongs to
the cluster to avoid a cluster with an even number of nodes.
Additionally, the fifth node is a stand-alone rack server, while the
four other nodes are blade server, two of the in two different blade
racks - this way, even if either of the blade racks goes down, I won't
lose the cluster.) This seems to create all sorts of trouble. For
example, if I try to manipulate clvm'd filesystems on the other four
nodes, they refuse to commit changes if the fifth node is up. And even
if I've restricted the SAN-access-needing services to run only on the
four nodes that have the access, the cluster system tries to shut the
services down in the fifth node also (when quorum is lost, for example)
- and complains about being unable to stop them and, on the nodes that
should run the services, refuses to restart them until I've removed the
fifth node from the cluster and fenced it. (Or, rather, I've removed the
fifth node from the cluster and one of the other nodes has successfully
fenced it.)

So.

Is it really necessary that all the members in a cluster have access to
all the resources that any of the members have, even if the services in
the cluster are partitioned to run in only a part of the cluster? Or is
there a way to tell the cluster that it shouldn't care about the fifth
members opinion about certain services; that is, it doesn't need to
check if the services are running on it, because they never do. Or
should I just make sure that the fifth member always comes up last (that
is, won't be running while the others are coming up)? Or should I aceept
that I'm going to create more harm than avoiding by letting the fifth
node belong to the cluster, and just run it outside the cluster?

Sorry if this was incoherent. I'm a bit tired; this system should be in
production in two weeks, and unexpected problems (that didn't come up
during testing) keep coming up... Any suggestions would be greatly
appreciated.

--Janne
-- 
Janne Peltonen <janne.peltonen@xxxxxxxxxxx>

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster