Hi Brem, Thanks for the pointers. The link to "OCF RA API Draft" appears to answer my questions. It will take a while to digest all that. I think you had a typo - "clusvcadm -D myfailedservice" should be "clusvcadm -d myfailedservice". My service (mysql) was failing because "shutdown_wait" was too low, causing stops and restarts to fail. Sure enough, your suggestion works: sudo /usr/sbin/clusvcadm -d mysql_service (fix config) sudo /usr/sbin/clusvcadm -e mysql_service And I suppose that if the service is in a mess on its current node - eg. software error prevents shutdown - then I would disable and then relocate the service: sudo /usr/sbin/clusvcadm -d mysql_service sudo /usr/sbin/clusvcadm -e mysql_service -m othernode regards, Martin -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of brem belguebli Sent: 23 October 2009 19:21 To: linux clustering Subject: Re: some questions about rgmanager 2009/10/23 Martin Waite <Martin.Waite@xxxxxxxxxxxx>: > Hi, > > Are there any guidelines about how to write resource scripts that will > be run by rgmanager /clurgmgrd ? > > I have been tracing execution through rg_test, but I don't know how > representative this is. For example, performing a service check through > rg_test calls just about every script in /usr/share/cluster with the > "meta-data" command, then calling service.sh with command "status", and > finally the resource script with the command "status". Is this what > will happen when clurgmgrd starts or stops a service ? > > Is there a specification covering the environment variables supplied to > the resource scripts - eg. OCF_RESOURCE_INSTANCE ? Usefull info can be found at http://sources.redhat.com/cluster/wiki/RGManager > > Are the actions of the various scripts documented or specified somewhere > ? Do they tend to change across releases ? > > Is there a standard way of extending the monitoring performed by the > scripts, or do I just edit the supplied scripts to suit ? > > During experiments in configuring a service, the cluster often reached a > state where clustat reports a service as "failed". What is the best way > of recovering from this state ? I cannot see that clusvcadm can be used > to recover from this state, and so far the only path to recovery appears > to be to restart rgmanager on all cluster nodes. > >From my experience, no need from restarting rgmanager, just disable the failed service (clusvcadm -D myfailedservice,), find out/fix what caused the service to fail (in general scripting errors), restart the service (clusvcadm -e myfailedservice) > Thanks in advance for any pointers on this. > > -- Martin > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster