Dear All,
I have a two node cluster (rgmanager-3.0.12.1-17.el6.x86_64) with a
shared storage. The storage contains the quorum disk also.
there are some services, and there are some dependencies set.
We are testing what is happening if the storage is disconnected from one
node (to see the cluster response for such a failure).
So we start from a good cluster (all is OK) and we disconnect the
storage from the first node.
What I have observed:
1. the cluster is fencing node 1
2. node 2 is trying to start the services, but even if we have 3
services (let's say B,C,D) which are depending on a fourth one (say A)
the cluster is trying to start the services in this order: B,C,D,A.
Obviously it fails for B,C,D and gives us the following messages:
Jul 29 15:49:54 node1 rgmanager[5135]: service:B is not runnable;
dependency not met
Jul 29 15:49:54 node2 rgmanager[5135]: Not stopping service:B: service
is failed
Jul 29 15:49:54 node2 rgmanager[5135]: Unable to stop RG service:B in
recoverable state
it will leave them in "recoverable" state even if service A will start
successfully (so the dependency would be met now). Why is this happening?
I would expect the rgmanager to start the services in an order that
would satisfy the dependency relationships. Or if it is not doing that ,
then at least to react to the service state change event (service A has
started, so dependencies should be evaluated again).
What can be done about it?
Thank you in advance,
Laszlo
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster