On Thu, Nov 27, 2008 at 15:57, Brian Kroth <bpkroth@xxxxxxxxx> wrote: > Hello all, > > I've been using Heartbeat in past to do resource failover with the > following scheme: > > 1) Each node in the cluster runs a dummy monitoring resource agent as a > clone. This resource agent monitors the health of a service on the node > using whatever rules one wants to write into it. For instance, make > sure the service is not in maintenance mode, mysql is running, queries > return timely, and replication is up to date. If all the checks pass it > uses attrd_updater to set an attribute for that service on the node to > 1. Else, it is set to 0. Note that this resource agent in no way > affects the service it is monitoring. > > 2) The cluster configuration uses the attributes for each of the > monitored services to generate a score for the machine. The machine > with the highest score gets to host the virtual ip for that service. > > This scheme allows one to, for instance, touch a file on a machine that > will signify that it's in maintenance mode. The service ip would then > be moved to another node, leaving one to test out the service on the > machine's management ip without removing it from the cluster itself > which would cause a lack of gfs access. It also provides for more > granular monitoring of each service. > > I want to know how I would configure rgmanager with something similar to > this - to have resource agents that continually monitor the status of a > service on each node and then move service IPs accordingly. Just out of interest, where did the rgmanager requirement come from? <blatant-advertisement> The Heartbeat resource manager also runs on OpenAIS now which, IIRC, is what rgmanager uses... so, in theory, it can manage anything rgmanager can. </blatant-advertisement> > > I see that one can write their own agents, but I don't see a scoring > scheme anywhere. My concern is that if I simply write an agent to > monitor a service and have an ip depend upon the return code of that > monitoring agent the service would not ever be failed back to the > original host. > > Does this make sense? > > Thanks, > Brian > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster