Summary: I introduced a problem into my custom resource script, and had a hard time debugging it because rgmanager didn't indicate any problem and claimed to start the service when it actually didn't. [ RHCS using rgmanager-2.0.52-1.el5.centos on CentOS 5.3 ] As part of my attempts to log, described in my earlier message to this list, I added a line near the top of my custom resource agent script: ocf_log info "Called $0 $*" I didn't realize at the time that ocf_log would also output to stdout, meaning that when my script was called for meta-data, this line would add output to the XML and make it invalid. Here's the service definition from my cluster.conf (with some details removed from the <dr> node) <rm log_level="6"> <service autostart="1" name="dr" recovery="relocate"> <ip address="10.6.9.130" monitor_link="1"> <dr name="dr_a" ... other arguments ...> <dr name="dr_b" ... other arguments ...> </ip> </service> </rm> Now, here's what I saw in /var/log/messages when I started rgmanager: Aug 23 16:35:02 drnode1 kernel: dlm: Using TCP for communications Aug 23 16:35:03 drnode1 clurgmgrd[13754]: <notice> Resource Group Manager Starting Aug 23 16:35:03 drnode1 clurgmgrd[13754]: <info> Loading Service Data Aug 23 16:35:03 drnode1 clurgmgrd: [13754]: <info> Called /usr/share/cluster/dr.sh meta-data Aug 23 16:35:04 drnode1 clurgmgrd[13754]: <info> Initializing Services Aug 23 16:35:04 drnode1 clurgmgrd[13754]: <info> Services Initialized Aug 23 16:35:04 drnode1 clurgmgrd[13754]: <info> State change: Local UP Aug 23 16:35:09 drnode1 clurgmgrd[13754]: <notice> Starting stopped service service:dr Aug 23 16:35:09 drnode1 clurgmgrd: [13754]: <info> Adding IPv4 address 10.6.9.130/23 to eth0 Aug 23 16:35:10 drnode1 clurgmgrd[13754]: <notice> Service service:dr started But in fact, it never called the resouce script with "start" and never attempted to start dr_a or dr_b. This is because the metadata did not validate. When I removed that ocf_log statement from the top level of the script (and also later, when I wrapped it in an if to only call it if $1 wasn't "meta-data"), the service began working again. It was tricky to diagnose: 1. No indication that, when it called my script for meta-data, it got bad XML and decided the script was bad or the "dr" resource was not there, or something like that. 2. No indication that it could not find a valid resource agent script for a resource that my cluster.conf says is a part of the service. 3. It claimed the service started, even though it didn't even attempt to start the resource. clustat claimed the service started, too. 4. No health checks ran, obviously, so it never noticed anything wrong. What's going on here? rgmanager bugs? Is there anywhere I could look to see indications of these problems? Shouldn't it be logging something about these problems? And above all, why did it claim the service started successfully? -- Cos -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster