On Wed, 2006-03-29 at 12:59 +0200, Marco Lusini wrote: > > As a test I have created script that always return error on status, > and set up a service > with the failing script as the single resource. > > ------- snip ------ > #!/bin/bash > case "$1" in > start) > exit 0 > ;; > stop) > exit 0 > ;; > status) > exit 1 > ;; > esac > ------ snip ------- > > Now my cluster keeps restarting that resource or, if I change the > recover > policy, keeps relocating it forever. If I choose disable as recover > policy, > the reource get disabled whitout even trying to restart/recover. > > Is this the expected behaviour? Yes. > Is there a way to configure CS4 to first try to restart locally, then > to try relocate and finally disable the service? If it fails to start after a failure, it is relocated to another node. If it fails to start on all nodes, it is placed in the 'stopped' state. Tracking the history of nodes where a service has started but at some point failed is slightly difficult since node IDs are not guaranteed to be static in linux-cluster right now: a node can leave and a different node can join and take the vacant node ID. It is, however, possible to configure static node IDs in cluster.conf, which would help. If you are worried about this particular state (where the service is horribly broken and moving around or restarting a lot), you can perform checks in your script to see if it just started + crashed locally; this will help. You can also file a bugzilla / feature request -- it's certainly not impossible to implement at some point. -- Lon -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster