After thinking about it a bit more I noticed that the Apache log shows the "caught SIGTERM, shutting down" message 1 second after the start message, so I thought maybe Pacemaker wasn't allowing Apache enough time to start, so I manually set the timeout for the start operation to 40s (by default it should be 40s already) (see bottom of message for my config). This did not fix the problem. I did find /usr/lib/ocf/resource.d/heartbeat/apache, which is what Pacemaker uses to start, stop, and monitor Apache. When I run it manually, to start Apache, "waiting for apache /etc/apache2/httpd.conf to come up" is followed IMMEDIATELY by the kill attempt. It does not wait 40s for the start to timeout: # OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/apache start apache[27172]: INFO: apache not running apache[27172]: INFO: waiting for apache /etc/apache2/httpd.conf to come up /usr/lib/ocf/resource.d/heartbeat/apache: line 440: kill: (27389) - No such process apache[27172]: INFO: Killing apache PID 27389 apache[27172]: INFO: apache stopped. If I try to monitor Apache while it's off: # OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/apache monitor apache[30211]: INFO: apache not running ... which is correct. If I then manually start Apache and then run "monitor" it shows that it's running, so Pacemaker *could* tell that Apache is running if it was working right: # rcapache2 start Starting httpd2 (prefork) done # rcapache2 start Apache is already running (/var/run/httpd2.pid) done # OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/apache monitor (no error message, "/usr/lib/ocf/resource.d/heartbeat/apache monitor" is showing that Apache is running".) So the problem seems to be that Pacemaker starts Apache, immediately checks to see if it's running and when it's not running a split second later Pacemaker (or more precisely /usr/lib/ocf/resource.d/heartbeat/apache) then kills the process without waiting for it to start. Any suggestions? node install0 node install1 primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.1.24" cidr_netmask="32" \ op monitor interval="30s" primitive FileSystemDRBD ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/home/src" fstype="ext3" \ operations $id="FileSystemDRBD-operations" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="20" timeout="40" start-delay="0" \ op notify interval="0" timeout="60" primitive VolumeDRBD ocf:linbit:drbd \ params drbd_resource="install" \ operations $id="VolumeDRBD-operations" \ op start interval="0" timeout="240" \ op promote interval="0" timeout="90" \ op demote interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" timeout="20" start-delay="0" \ op notify interval="0" timeout="90" \ meta target-role="started" primitive WebSite ocf:heartbeat:apache \ operations $id="WebSite-operations" \ op start interval="0" timeout="40s" \ op stop interval="0" timeout="60s" \ op monitor interval="10" timeout="20" start-delay="0" \ meta target-role="started" group Cluster ClusterIP FileSystemDRBD WebSite \ meta target-role="Started" ms MasterDRBD VolumeDRBD \ meta clone-max="2" notify="true" target-role="started" colocation WebServerWithIP inf: Cluster MasterDRBD:Master order StartFileSystemFirst inf: MasterDRBD:promote Cluster:start property $id="cib-bootstrap-options" \ dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1320896853" On 11/09/2011 06:43 PM, Earl Ruby wrote: > I've set up a 2-node Corosync cluster with Master/Slave DRBD, ClusterIP, > a Filesystem resource, and Apache. > > Everything works fine except Apache. I can start Apache from the command > line just fine, but when I shut it off on both nodes and then run: > > crm resource cleanup WebSite > > It fails to start. The Apache error_log on both nodes shows two lines > each time I run cleanup: > > [Thu Nov 10 02:37:33 2011] [notice] Apache/2.2.17 (Linux/SUSE) > mod_ssl/2.2.17 OpenSSL/1.0.0c mod_perl/2.0.5 Perl/v5.12.3 configured -- > resuming normal operations > [Thu Nov 10 02:37:34 2011] [notice] caught SIGTERM, shutting down > > "grep -i apache /var/log/corosync.log" gives no useful info. > > Any idea on what command Pacemaker uses to start Apache? As I said, *I* > can start it from the command line no problem, but Pacemaker fails. > > Any suggestions on how I should go about troubleshooting this? What I > should be looking at? > > My config looks like this: > > node install0 > node install1 > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="192.168.1.24" cidr_netmask="32" \ > op monitor interval="30s" > primitive FileSystemDRBD ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/home/src" fstype="ext3" \ > op monitor interval="60" timeout="40" start-delay="10" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" > primitive VolumeDRBD ocf:linbit:drbd \ > params drbd_resource="install" \ > operations $id="VolumeDRBD-operations" \ > op start interval="0" timeout="240" \ > op promote interval="0" timeout="90" \ > op demote interval="0" timeout="90" \ > op stop interval="0" timeout="100" \ > op monitor interval="10" timeout="20" start-delay="0" \ > op notify interval="0" timeout="90" \ > meta target-role="started" > primitive WebSite ocf:heartbeat:apache \ > params configfile="/etc/apache2/httpd.conf" \ > op monitor interval="1min" > group Cluster ClusterIP FileSystemDRBD WebSite \ > meta target-role="Started" > ms MasterDRBD VolumeDRBD \ > meta clone-max="2" notify="true" target-role="started" > colocation WebServerWithIP inf: Cluster MasterDRBD:Master > order StartFileSystemFirst inf: MasterDRBD:promote Cluster:start > property $id="cib-bootstrap-options" \ > dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1320891100" > -- Earl C. Ruby III Director of Engineering _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss