On Fri, Jul 10, 2009 at 04:50:12PM -0700, Rick Stevens wrote: > jason@xxxxxxxxxxxxxx wrote: >> hey cluster gurus.. >> I have a 2 node cluster thats been running without issue for quite a >> while.. all of a sudden one of the nodes will not completely start the >> apache webserver service.. it looks like this [root@tf1 ~]# clustat >> Member Status: Quorate >> Member Name Status >> ------ ---- ------ >> tf1 Online, Local, rgmanager >> tf2 Online, rgmanager >> Service Name Owner (Last) State >> ------- ---- ----- ------ ----- Apache >> Service tf1 starting postfix >> service tf1 started [root@tf1 ~]# >> and I see that the httpd is NOT started. although, if I do >> /etc/init.d/httpd start >> the service starts without issue. >> grepping for apache and http in the logs, I see this.. >> Jul 10 14:32:13 tf1 httpd: httpd shutdown failed >> Jul 10 14:32:52 tf1 httpd: httpd shutdown failed >> Jul 10 14:33:11 tf1 httpd: httpd shutdown failed >> Jul 10 14:33:57 tf1 httpd: Syntax error on line 117 of >> /etc/httpd/conf.d/ssl.conf: >> Jul 10 14:33:57 tf1 httpd: SSLCertificateFile: file >> '/etc/httpd/conf/ssl.crt/server.crt' does not exist or is empty >> Jul 10 14:33:57 tf1 httpd: httpd startup failed >> Jul 10 14:34:06 tf1 httpd: Syntax error on line 117 of >> /etc/httpd/conf.d/ssl.conf: >> Jul 10 14:34:06 tf1 httpd: SSLCertificateFile: file >> '/etc/httpd/conf/ssl.crt/server.crt' does not exist or is empty >> Jul 10 14:34:06 tf1 httpd: httpd startup failed >> Jul 10 14:34:08 tf1 httpd: httpd shutdown failed >> Jul 10 16:23:33 tf1 clurgmgrd: [6168]: <info> Executing /etc/init.d/httpd >> stop Jul 10 16:23:34 tf1 httpd: httpd shutdown failed >> Jul 10 16:24:31 tf1 httpd: httpd shutdown failed >> Jul 10 16:24:36 tf1 httpd: httpd shutdown failed >> Jul 10 16:24:41 tf1 httpd: httpd startup succeeded >> Jul 10 18:10:13 tf1 clurgmgrd: [6231]: <info> Executing /etc/init.d/httpd >> stop Jul 10 18:10:13 tf1 httpd: httpd shutdown failed >> Jul 10 18:22:00 tf1 httpd: httpd startup succeeded >> [root@tf1 log]# grep apache messages >> Jul 10 04:40:00 tf1 clurgmgrd[6267]: <notice> stop on script >> "cluster_apache" returned 1 (generic error) Jul 10 10:04:33 tf1 >> clurgmgrd[6149]: <notice> stop on script "cluster_apache" returned 1 >> (generic error) Jul 10 14:29:54 tf1 clurgmgrd[6281]: <notice> stop on >> script "cluster_apache" returned 1 (generic error) Jul 10 16:23:34 tf1 >> clurgmgrd[6168]: <notice> stop on script "cluster_apache" returned 1 >> (generic error) Jul 10 18:10:13 tf1 clurgmgrd[6231]: <notice> stop on >> script "cluster_apache" returned 1 (generic error) [root@tf1 log]# Im >> guessing its the stop on script "cluster_apache" returned 1 (generic >> error) >> but I looked at the /etc/init.d/httpd on tf1 and tf2 and they are both the >> same size >> [root@tf2 ~]# ls -al /etc/init.d/httpd >> -rwxr-xr-x 1 root root 3201 Jan 30 2007 /etc/init.d/httpd >> [root@tf1 log]# ls -al /etc/init.d/httpd >> -rwxr-xr-x 1 root root 3201 Jan 30 2007 /etc/init.d/httpd >> and the apache service starts/stops just fine on tf2 when the services get >> failed over to that machine. >> any ideas on what can be wrong? > > tf1 is complaining about a bad SSL cert. The fact that it's complaining > when being started by clurgmgrd but not when started manually indicates > that clurgmgrd is starting it differently (specifying a different > httpd.conf file perhaps?). well, heres the relevant part of my config file <rm> <failoverdomains> <failoverdomain name="httpd" ordered="1" restricted="1"> <failoverdomainnode name="tf1" priority="1"/> <failoverdomainnode name="tf2" priority="2"/> </failoverdomain> </failoverdomains> <resources> <script file="/etc/init.d/httpd" name="cluster_apache"/> <ip address="192.168.1.7" monitor_link="1"/> <script file="/etc/init.d/postfix" name="cluster_posstfix"/> </resources> <service autostart="1" domain="httpd" name="Apache Service"> <ip ref="192.168.1.7"/> <script ref="cluster_apache"/> </service> <service autostart="1" domain="httpd" name="postfix service"> <ip ref="192.168.1.7"/> <script ref="cluster_posstfix"/> </service> </rm> ive never seen that ssl error when starting the service manually. the other thing that I noticed.. is that when I try to do [root@tf1 cluster]# clusvcadm -d "Apache Service" Member tf1 disabling Apache Service... it just hangs there and never returns. Jason -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster