> First thing to test is that you can configure the IP address manually, > mount the filesystem, and start apache "the old-fashioned way", using > the /etc/init.d/httpd script on either machine. [root@tf1 log]# /etc/init.d/httpd start Starting httpd: (99)Cannot assign requested address: make_sock: could not bind to address 192.168.1.7:80 no listening sockets available, shutting down > > If that works, then I'd guess your problem with the cluster service is > that the <ip > resource needs to be listed before the <script > > resource, inside the <service/> block, since apache will bomb if the IP > address you told it to bind to isn't present (and I assume apache is > configured to bind to that address). If that's the case, then you > should see an error concerning it in the apache error.log. > > As far as nothing being logged about the cluster service trying to > start, it SHOULD be logging in /var/log/messages, but I've seen some > wierdness with this in the past. A healthy cluster node should show > something like this when the service starts: > > Jun 22 09:36:51 knob clurgmgrd[3652]: <notice> Starting stopped service > maps_ip > Jun 22 09:36:51 knob clurgmgrd: [3652]: <info> Adding IPv4 address > x.y.8.60 to eth0 > Jun 22 09:36:52 knob clurgmgrd[3652]: <notice> Service maps_ip started > Jun 22 09:36:52 knob clurgmgrd[3652]: <notice> Starting stopped service > httpd > Jun 22 09:36:52 knob clurgmgrd: [3652]: <info> Executing > /etc/init.d/httpd start > Jun 22 09:36:54 knob httpd: httpd startup succeeded > Jun 22 09:36:54 knob clurgmgrd[3652]: <notice> Service httpd started well, I see messages, but never ones with clurgmgrd Jul 1 08:27:10 tf1 network: Setting network parameters: succeeded Jul 1 08:27:10 tf1 network: Bringing up loopback interface: succeeded Jul 1 08:27:14 tf1 network: Bringing up interface eth0: succeeded Jul 1 08:27:19 tf1 network: Bringing up interface eth2: succeeded Jul 1 08:27:19 tf1 procfgd: Starting procfgd: succeeded Jul 1 08:27:24 tf1 kernel: CMAN: Waiting to join or form a Linux-cluster Jul 1 08:27:24 tf1 ccsd[3928]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.5 Jul 1 08:27:24 tf1 ccsd[3928]: Initial status:: Inquorate Jul 1 08:27:56 tf1 kernel: CMAN: forming a new cluster Jul 1 08:27:56 tf1 kernel: CMAN: quorum regained, resuming activity Jul 1 08:27:56 tf1 ccsd[3928]: Cluster is quorate. Allowing connections. Jul 1 08:27:56 tf1 kernel: DLM 2.6.9-41.7 (built May 22 2006 17:34:37) installed Jul 1 08:27:56 tf1 cman: startup succeeded Jul 1 08:27:56 tf1 lock_gulmd: no <gulm> section detected in /etc/cluster/cluster.conf succeeded Jul 1 08:27:57 tf1 fenced: startup succeeded Jul 1 08:27:59 tf1 clvmd: Cluster LVM daemon started - connected to CMAN Jul 1 08:27:59 tf1 clvmd: clvmd startup succeeded Jul 1 08:27:59 tf1 kernel: cdrom: open failed. Jul 1 08:28:00 tf1 kernel: cdrom: open failed. Jul 1 08:28:00 tf1 vgchange: 1 logical volume(s) in volume group "diskarray" now active Jul 1 08:28:00 tf1 clvmd: Activating VGs: succeeded Jul 1 08:28:00 tf1 netfs: Mounting other filesystems: succeeded Jul 1 08:28:00 tf1 kernel: Lock_Harness 2.6.9-49.1 (built May 22 2006 17:38:48) installed Jul 1 08:28:00 tf1 kernel: GFS 2.6.9-49.1 (built May 22 2006 17:39:06) installed Jul 1 08:28:00 tf1 kernel: GFS: Trying to join cluster "lock_dlm", "progressive:lv1" Jul 1 08:28:00 tf1 kernel: Lock_DLM (built May 22 2006 17:38:50) installed Jul 1 08:28:02 tf1 kernel: GFS: fsid=progressive:lv1.0: Joined cluster. Now mounting FS... Jul 1 08:28:02 tf1 kernel: GFS: fsid=progressive:lv1.0: jid=0: Trying to acquire journal lock... Jul 1 08:28:02 tf1 kernel: GFS: fsid=progressive:lv1.0: jid=0: Looking at journal... Jul 1 08:28:03 tf1 kernel: GFS: fsid=progressive:lv1.0: jid=0: Done I compiled/installed all this from source.. Im guessing I missed the clurgmgrd part.. Ill go back and look. > (I always find the concept of "starting" an IP address faintly > hilarious), and then you should see something like: > > Jun 22 09:37:33 knob clurgmgrd: [3652]: <info> Executing > /etc/init.d/httpd status > > every 30 seconds or so. yeah, I never see this. > > That brings me to an important point - the apache init script doesn't > follow whatever standard RedHat init script are supposed to follow > (there's a thread about this that I was involved in 6-9 months back), > with respect to the status command. At least, it didn't at the time, > maybe they've fixed it (I hope, by now). The stop action return(s/ed) > non-zero (failure) if apache wasn't running. If the cluster manager > thinks that service was failed, it will first try to stop it before > starting it. If the apache script returns failure on the attempt to > stop it because it was stopped already, then the cluster manager will > think something's wrong and never try to start it. The upshot of which > is, you have to hack the init script to make it return 0 in this > situation. I took the copout approach of just forcing it to always > return 0: > > stop() { > echo -n $"Stopping $prog: " > killproc $httpd > - RETVAL=$? > + RETVAL=0 # makes cluster admin less crazy > echo > [ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile} > } > > which should be safe enough (if killproc fails to kill it you've > probably got bigger problems on your hands), but could be better. > Someone else may have pasted a better patch on this list, check the > archives. > > I just checked a fresh install of httpd on an AS 4 latest box, and the > script is still the same. Convenient, since httpd is the specific > example service used for setting up a cluster service in the Cluster > Suite docs. ;-) > > I hope this helps - I'll stop rambling now. > > Oh, one other thing - if the filesystem is GFS, why bother > mounting/unmounting at all? Just have it mounted in fstab, or make it a > separate cluster service if you want the extra assurance that it'll stay > mounted. ooh I do have it in the fstab... thats just me not fully understanding how all this is supposed to work. Jason > > > > > ><?xml version="1.0"?> > ><cluster config_version="22" name="progressive"> > > <fence_daemon clean_start="0" post_fail_delay="0" > > post_join_delay="3"/> > > <clusternodes> > > <clusternode name="tf1" votes="1"> > > <fence> > > <method name="1"> > > <device name="apc_power_switch" > > option="off" port="1" switch="1"/> > > <device name="apc_power_switch" > > option="off" port="2" switch="1"/> > > <device name="apc_power_switch" > > option="on" port="1" switch="1"/> > > <device name="apc_power_switch" > > option="on" port="2" switch="1"/> > > </method> > > </fence> > > </clusternode> > > <clusternode name="tf2" votes="1"> > > <fence> > > <method name="1"> > > <device name="apc_power_switch" > > option="off" port="3" switch="1"/> > > <device name="apc_power_switch" > > option="off" port="4" switch="1"/> > > <device name="apc_power_switch" > > option="on" port="3" switch="1"/> > > <device name="apc_power_switch" > > option="on" port="4" switch="1"/> > > </method> > > </fence> > > </clusternode> > > </clusternodes> > > <cman expected_votes="1" two_node="1"/> > > <fencedevices> > > <fencedevice agent="fence_apc" ipaddr="192.168.1.8" > > login="apc" name="apc_power_switch" passwd="apc"/> > > </fencedevices> > > <rm> > > <failoverdomains> > > <failoverdomain name="httpd" ordered="1" > > restricted="1"> > > <failoverdomainnode name="tf1" > > priority="1"/> > > <failoverdomainnode name="tf2" > > priority="2"/> > > </failoverdomain> > > </failoverdomains> > > <resources> > > <script file="/etc/init.d/httpd" > > name="cluster_apache"/> > > <fs device="/dev/mapper/diskarray-lv1" > > fstype="ext3" mountpoint="/mnt/gfs/htdocs" name="apache_content"/> > > <ip address="192.168.1.7" monitor_link="1"/> > > </resources> > > <service autostart="1" domain="httpd" name="Apache > > Service"> > > <script ref="cluster_apache"/> > > <fs ref="apache_content"/> > > <ip ref="192.168.1.7"/> > > </service> > > </rm> > ></cluster> > > > > > >ooh the other thing is that I had to lie about the filesystem in which it > >lives, it only gave me the ext2/ext3 options, (i chose ext3) but its on a > >gfs partition. > > > >Jason > > > >-- > > > >Linux-cluster@xxxxxxxxxx > >https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- ================================================ | Jason Welsh jason@xxxxxxxxxxxxxx | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster