On 11/10/2011 05:05 AM, Earl Ruby wrote: > After thinking about it a bit more I noticed that the Apache log shows > the "caught SIGTERM, shutting down" message 1 second after the start > message, so I thought maybe Pacemaker wasn't allowing Apache enough time > to start, so I manually set the timeout for the start operation to 40s > (by default it should be 40s already) (see bottom of message for my config). > > This did not fix the problem. > > I did find /usr/lib/ocf/resource.d/heartbeat/apache, which is what > Pacemaker uses to start, stop, and monitor Apache. When I run it > manually, to start Apache, "waiting for apache /etc/apache2/httpd.conf > to come up" is followed IMMEDIATELY by the kill attempt. It does not > wait 40s for the start to timeout: > > # OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/apache start > apache[27172]: INFO: apache not running > apache[27172]: INFO: waiting for apache /etc/apache2/httpd.conf to come up > /usr/lib/ocf/resource.d/heartbeat/apache: line 440: kill: (27389) - No > such process > apache[27172]: INFO: Killing apache PID 27389 > apache[27172]: INFO: apache stopped. > > > If I try to monitor Apache while it's off: > > # OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/apache monitor > apache[30211]: INFO: apache not running > > > ... which is correct. If I then manually start Apache and then run > "monitor" it shows that it's running, so Pacemaker *could* tell that > Apache is running if it was working right: > > # rcapache2 start > Starting httpd2 (prefork) > done > > # rcapache2 start > Apache is already running (/var/run/httpd2.pid) > done > > # OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/apache monitor > > (no error message, "/usr/lib/ocf/resource.d/heartbeat/apache monitor" is > showing that Apache is running".) > > > So the problem seems to be that Pacemaker starts Apache, immediately > checks to see if it's running and when it's not running a split second > later Pacemaker (or more precisely > /usr/lib/ocf/resource.d/heartbeat/apache) then kills the process without > waiting for it to start. > > Any suggestions? > > > > node install0 > node install1 > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="192.168.1.24" cidr_netmask="32" \ > op monitor interval="30s" > primitive FileSystemDRBD ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/home/src" fstype="ext3" \ > operations $id="FileSystemDRBD-operations" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" \ > op monitor interval="20" timeout="40" start-delay="0" \ > op notify interval="0" timeout="60" > primitive VolumeDRBD ocf:linbit:drbd \ > params drbd_resource="install" \ > operations $id="VolumeDRBD-operations" \ > op start interval="0" timeout="240" \ > op promote interval="0" timeout="90" \ > op demote interval="0" timeout="90" \ > op stop interval="0" timeout="100" \ > op monitor interval="10" timeout="20" start-delay="0" \ > op notify interval="0" timeout="90" \ > meta target-role="started" > primitive WebSite ocf:heartbeat:apache \ > operations $id="WebSite-operations" \ > op start interval="0" timeout="40s" \ > op stop interval="0" timeout="60s" \ > op monitor interval="10" timeout="20" start-delay="0" \ > meta target-role="started" > group Cluster ClusterIP FileSystemDRBD WebSite \ > meta target-role="Started" > ms MasterDRBD VolumeDRBD \ > meta clone-max="2" notify="true" target-role="started" > colocation WebServerWithIP inf: Cluster MasterDRBD:Master > order StartFileSystemFirst inf: MasterDRBD:promote Cluster:start > property $id="cib-bootstrap-options" \ > dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1320896853" > > > > On 11/09/2011 06:43 PM, Earl Ruby wrote: >> I've set up a 2-node Corosync cluster with Master/Slave DRBD, ClusterIP, >> a Filesystem resource, and Apache. >> >> Everything works fine except Apache. I can start Apache from the command >> line just fine, but when I shut it off on both nodes and then run: >> >> crm resource cleanup WebSite >> >> It fails to start. The Apache error_log on both nodes shows two lines >> each time I run cleanup: >> >> [Thu Nov 10 02:37:33 2011] [notice] Apache/2.2.17 (Linux/SUSE) >> mod_ssl/2.2.17 OpenSSL/1.0.0c mod_perl/2.0.5 Perl/v5.12.3 configured -- >> resuming normal operations >> [Thu Nov 10 02:37:34 2011] [notice] caught SIGTERM, shutting down >> >> "grep -i apache /var/log/corosync.log" gives no useful info. >> >> Any idea on what command Pacemaker uses to start Apache? As I said, *I* >> can start it from the command line no problem, but Pacemaker fails. >> >> Any suggestions on how I should go about troubleshooting this? What I >> should be looking at? default monitor is requesting the status url of apache ... so typically mod_status is not enabled and therefor the monitoring fails. Either enable mod_status for local requests or change the "statusurl" parameter. ocf ra info / man ocf_heartbeat_apache ... are your friend ;-) Regards, Andreas -- Need help with Pacemaker/Corosync/DRBD? http://www.hastexo.com/now >> >> My config looks like this: >> >> node install0 >> node install1 >> primitive ClusterIP ocf:heartbeat:IPaddr2 \ >> params ip="192.168.1.24" cidr_netmask="32" \ >> op monitor interval="30s" >> primitive FileSystemDRBD ocf:heartbeat:Filesystem \ >> params device="/dev/drbd0" directory="/home/src" fstype="ext3" \ >> op monitor interval="60" timeout="40" start-delay="10" \ >> op start interval="0" timeout="60" \ >> op stop interval="0" timeout="60" >> primitive VolumeDRBD ocf:linbit:drbd \ >> params drbd_resource="install" \ >> operations $id="VolumeDRBD-operations" \ >> op start interval="0" timeout="240" \ >> op promote interval="0" timeout="90" \ >> op demote interval="0" timeout="90" \ >> op stop interval="0" timeout="100" \ >> op monitor interval="10" timeout="20" start-delay="0" \ >> op notify interval="0" timeout="90" \ >> meta target-role="started" >> primitive WebSite ocf:heartbeat:apache \ >> params configfile="/etc/apache2/httpd.conf" \ >> op monitor interval="1min" >> group Cluster ClusterIP FileSystemDRBD WebSite \ >> meta target-role="Started" >> ms MasterDRBD VolumeDRBD \ >> meta clone-max="2" notify="true" target-role="started" >> colocation WebServerWithIP inf: Cluster MasterDRBD:Master >> order StartFileSystemFirst inf: MasterDRBD:promote Cluster:start >> property $id="cib-bootstrap-options" \ >> dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \ >> cluster-infrastructure="openais" \ >> expected-quorum-votes="2" \ >> stonith-enabled="false" \ >> no-quorum-policy="ignore" \ >> last-lrm-refresh="1320891100" >> >
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss