I see you figured out your multiple ports fencing issue. Good, that
saves me a rant about system-config-cluster ... ;-)
First thing to test is that you can configure the IP address manually,
mount the filesystem, and start apache "the old-fashioned way", using
the /etc/init.d/httpd script on either machine.
If that works, then I'd guess your problem with the cluster service is
that the <ip > resource needs to be listed before the <script >
resource, inside the <service/> block, since apache will bomb if the IP
address you told it to bind to isn't present (and I assume apache is
configured to bind to that address). If that's the case, then you
should see an error concerning it in the apache error.log.
As far as nothing being logged about the cluster service trying to
start, it SHOULD be logging in /var/log/messages, but I've seen some
wierdness with this in the past. A healthy cluster node should show
something like this when the service starts:
Jun 22 09:36:51 knob clurgmgrd[3652]: <notice> Starting stopped service
maps_ip
Jun 22 09:36:51 knob clurgmgrd: [3652]: <info> Adding IPv4 address
x.y.8.60 to eth0
Jun 22 09:36:52 knob clurgmgrd[3652]: <notice> Service maps_ip started
Jun 22 09:36:52 knob clurgmgrd[3652]: <notice> Starting stopped service
httpd
Jun 22 09:36:52 knob clurgmgrd: [3652]: <info> Executing
/etc/init.d/httpd start
Jun 22 09:36:54 knob httpd: httpd startup succeeded
Jun 22 09:36:54 knob clurgmgrd[3652]: <notice> Service httpd started
(I always find the concept of "starting" an IP address faintly
hilarious), and then you should see something like:
Jun 22 09:37:33 knob clurgmgrd: [3652]: <info> Executing
/etc/init.d/httpd status
every 30 seconds or so.
That brings me to an important point - the apache init script doesn't
follow whatever standard RedHat init script are supposed to follow
(there's a thread about this that I was involved in 6-9 months back),
with respect to the status command. At least, it didn't at the time,
maybe they've fixed it (I hope, by now). The stop action return(s/ed)
non-zero (failure) if apache wasn't running. If the cluster manager
thinks that service was failed, it will first try to stop it before
starting it. If the apache script returns failure on the attempt to
stop it because it was stopped already, then the cluster manager will
think something's wrong and never try to start it. The upshot of which
is, you have to hack the init script to make it return 0 in this
situation. I took the copout approach of just forcing it to always
return 0:
stop() {
echo -n $"Stopping $prog: "
killproc $httpd
- RETVAL=$?
+ RETVAL=0 # makes cluster admin less crazy
echo
[ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
}
which should be safe enough (if killproc fails to kill it you've
probably got bigger problems on your hands), but could be better.
Someone else may have pasted a better patch on this list, check the
archives.
I just checked a fresh install of httpd on an AS 4 latest box, and the
script is still the same. Convenient, since httpd is the specific
example service used for setting up a cluster service in the Cluster
Suite docs. ;-)
I hope this helps - I'll stop rambling now.
Oh, one other thing - if the filesystem is GFS, why bother
mounting/unmounting at all? Just have it mounted in fstab, or make it a
separate cluster service if you want the extra assurance that it'll stay
mounted.
-g
Jason wrote:
ok, one last question, I hope... im following the directions at
http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en/s1-apache-inshttpd.html
to set up apache as a test... and I cannot see that apache gets started on either of my cluster
nodes (only 2)
the ip address ive configured it as is an unused ip address in the subnet that both boxes are
on. how/where can I troubleshoot this? I dont see anything in the logs about the service trying
to start. here is my cluster.config
<?xml version="1.0"?>
<cluster config_version="22" name="progressive">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="tf1" votes="1">
<fence>
<method name="1">
<device name="apc_power_switch" option="off" port="1"
switch="1"/>
<device name="apc_power_switch" option="off" port="2"
switch="1"/>
<device name="apc_power_switch" option="on" port="1"
switch="1"/>
<device name="apc_power_switch" option="on" port="2"
switch="1"/>
</method>
</fence>
</clusternode>
<clusternode name="tf2" votes="1">
<fence>
<method name="1">
<device name="apc_power_switch" option="off" port="3"
switch="1"/>
<device name="apc_power_switch" option="off" port="4"
switch="1"/>
<device name="apc_power_switch" option="on" port="3"
switch="1"/>
<device name="apc_power_switch" option="on" port="4"
switch="1"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_apc" ipaddr="192.168.1.8" login="apc"
name="apc_power_switch" passwd="apc"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="httpd" ordered="1" restricted="1">
<failoverdomainnode name="tf1" priority="1"/>
<failoverdomainnode name="tf2" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<script file="/etc/init.d/httpd" name="cluster_apache"/>
<fs device="/dev/mapper/diskarray-lv1" fstype="ext3"
mountpoint="/mnt/gfs/htdocs" name="apache_content"/>
<ip address="192.168.1.7" monitor_link="1"/>
</resources>
<service autostart="1" domain="httpd" name="Apache Service">
<script ref="cluster_apache"/>
<fs ref="apache_content"/>
<ip ref="192.168.1.7"/>
</service>
</rm>
</cluster>
ooh the other thing is that I had to lie about the filesystem in which it lives, it only gave
me the ext2/ext3 options, (i chose ext3) but its on a gfs partition.
Jason
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster