Hi
I think i am almost there. I have started using RHEL6 hoping it would not give me any night-mare this time to setup a 2 Node Cluster for a Apache cluster service. and i think i have done pretty much everything.
In short,
1. Two nodes having private IP's eth0 configured with 192.168.18.10 and 192.168.18.11
2. Nodes are named as node1.localdomain, node2.localdomain, /etc/hosts taken care
3. I created the cluster, added two nodes, added the service WEB ( added the child :IP and :apache to it)
4. Cluster is in quorum and detects other node going offline fantastically
5. Tested the start/stop of this resource WEB using "rg_test" , it worked just fine on both the nodes.
6. But, for some reasons, its not starting or failing over to other node when i manually test(using clusvcadm -e WEB) or do a reboot or whatever.
7. Please let me know how do i verify the cluster startup and failover manually to make sure everything works
8. What is it i am missing that makes this not work now ? Please assist.
Please go through the output of all the commands attached herewith.
Let me know if there is still required.
Param
<?xml version="1.0"?> <cluster config_version="14" name="httpdCluster"> <logging debug="on"/> <cman expected_votes="1" two_node="1"/> <clusternodes> <clusternode name="node1.localdomain" nodeid="1" votes="1"> <fence> <method name="single"/> </fence> </clusternode> <clusternode name="node2.localdomain" nodeid="2" votes="1"> <fence> <method name="single"/> </fence> </clusternode> </clusternodes> <fencedevices/> <rm> <failoverdomains> <failoverdomain name="myFailOver" nofailback="0" ordered="1" restricted="0"> <failoverdomainnode name="node1.localdomain" priority="1"/> <failoverdomainnode name="node2.localdomain" priority="2"/> </failoverdomain> </failoverdomains> <resources> <apache config_file="conf/httpd.conf" name="apache" server_root="/etc/httpd" shutdown_wait="0"/> </resources> <service autostart="1" domain="myFailOver" exclusive="1" name="WEB" recovery="relocate"> <ip address="192.168.18.50" monitor_link="1" sleeptime="10"> <apache config_file="conf/httpd.conf" name="WEB" server_root="/etc/httpd" shutdown_wait="0"/> </ip> </service> </rm> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> </cluster> ================================================ [root@node2 apache]# clustat Cluster Status for httpdCluster @ Mon Aug 27 20:13:24 2012 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1.localdomain 1 Online, rgmanager node2.localdomain 2 Online, Local, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:WEB (node2.localdomain) failed [root@node2 apache]# ps -eaf | grep httpd root 17219 3171 0 20:15 pts/0 00:00:00 grep httpd [root@node2 apache]# ip addr list 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000 link/ether 00:0c:29:1a:5b:cf brd ff:ff:ff:ff:ff:ff inet 192.168.18.11/24 brd 192.168.18.255 scope global eth0 inet6 fe80::20c:29ff:fe1a:5bcf/64 scope link valid_lft forever preferred_lft forever [root@node2 apache]# /usr/share/cluster/apache.sh start service:WEB <debug> Verifying Configuration Of default Verifying Configuration Of default <error> Verifying Configuration Of default > Failed - Invalid Name Of Service Verifying Configuration Of default > Failed - Invalid Name Of Service [root@node2 apache]# rg_test test /etc/cluster/cluster.conf start service WEB Running in test mode. Loading resource rule from /usr/share/cluster/openldap.sh Loading resource rule from /usr/share/cluster/apache.sh Loading resource rule from /usr/share/cluster/named.sh Loading resource rule from /usr/share/cluster/lvm_by_lv.sh Loading resource rule from /usr/share/cluster/SAPDatabase Loading resource rule from /usr/share/cluster/postgres-8.sh Loading resource rule from /usr/share/cluster/clusterfs.sh Loading resource rule from /usr/share/cluster/ip.sh Loading resource rule from /usr/share/cluster/service.sh Loading resource rule from /usr/share/cluster/script.sh Loading resource rule from /usr/share/cluster/nfsserver.sh Loading resource rule from /usr/share/cluster/nfsexport.sh Loading resource rule from /usr/share/cluster/tomcat-6.sh Loading resource rule from /usr/share/cluster/lvm.sh Loading resource rule from /usr/share/cluster/lvm_by_vg.sh Loading resource rule from /usr/share/cluster/SAPInstance Loading resource rule from /usr/share/cluster/vm.sh Loading resource rule from /usr/share/cluster/ASEHAagent.sh Loading resource rule from /usr/share/cluster/samba.sh Loading resource rule from /usr/share/cluster/netfs.sh Loading resource rule from /usr/share/cluster/fs.sh Loading resource rule from /usr/share/cluster/mysql.sh Loading resource rule from /usr/share/cluster/nfsclient.sh Loading resource rule from /usr/share/cluster/oracledb.sh Loading resource rule from /usr/share/cluster/ocf-shellfuncs Loading resource rule from /usr/share/cluster/svclib_nfslock Starting WEB... <debug> Link for eth0: Detected Link for eth0: Detected <info> Adding IPv4 address 192.168.18.50/24 to eth0 Adding IPv4 address 192.168.18.50/24 to eth0 <debug> Pinging addr 192.168.18.50 from dev eth0 Pinging addr 192.168.18.50 from dev eth0 <debug> Sending gratuitous ARP: 192.168.18.50 00:0c:29:1a:5b:cf brd ff:ff:ff:ff:ff:ff Sending gratuitous ARP: 192.168.18.50 00:0c:29:1a:5b:cf brd ff:ff:ff:ff:ff:ff rdisc: no process killed <debug> Verifying Configuration Of apache:WEB Verifying Configuration Of apache:WEB <debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf Checking Syntax Of The File /etc/httpd/conf/httpd.conf <debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed <info> Starting Service apache:WEB Starting Service apache:WEB <debug> Looking For IP Addresses Looking For IP Addresses Query failed: Invalid argument (/cluster/rm/service[@name="WEB"]/ip[2]/@address) <debug> Looking For IP Addresses > Succeed - IP Addresses Found Looking For IP Addresses > Succeed - IP Addresses Found <debug> Checking: SHA1 checksum of config file /etc/cluster/apache/apache:WEB/httpd.conf Checking: SHA1 checksum of config file /etc/cluster/apache/apache:WEB/httpd.conf <debug> Checking: SHA1 checksum > succeed Checking: SHA1 checksum > succeed <debug> Generating New Config File /etc/cluster/apache/apache:WEB/httpd.conf From /etc/httpd/conf/httpd.conf Generating New Config File /etc/cluster/apache/apache:WEB/httpd.conf From /etc/httpd/conf/httpd.conf <debug> Generating New Config File /etc/cluster/apache/apache:WEB/httpd.conf From /etc/httpd/conf/httpd.conf > Succeed Generating New Config File /etc/cluster/apache/apache:WEB/httpd.conf From /etc/httpd/conf/httpd.conf > Succeed <debug> Starting Service apache:WEB > Succeed Starting Service apache:WEB > Succeed Start of WEB complete [root@node2 apache]# ip addr list 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000 link/ether 00:0c:29:1a:5b:cf brd ff:ff:ff:ff:ff:ff inet 192.168.18.11/24 brd 192.168.18.255 scope global eth0 inet 192.168.18.50/24 scope global secondary eth0 inet6 fe80::20c:29ff:fe1a:5bcf/64 scope link valid_lft forever preferred_lft forever [root@node2 apache]# ps -eaf | grep httpd | wc -l 10 [root@node2 apache]# wget http://192.168.18.50 --2012-08-27 20:17:15-- http://192.168.18.50/ Connecting to 192.168.18.50:80... connected. HTTP request sent, awaiting response... 200 OK Length: 22 [text/html] Saving to: `index.html' 100%[===============================================================================================================>] 22 --.-K/s in 0s 2012-08-27 20:17:15 (3.98 MB/s) - `index.html' saved [22/22] /var/log/messages Aug 27 19:24:44 node2 rgmanager[9388]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:25:03 node2 rgmanager[9523]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:25:39 node2 rgmanager[10429]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:26:23 node2 rgmanager[10585]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:26:50 node2 rgmanager[10730]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:26:58 node2 rgmanager[10807]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:27:10 node2 rgmanager[10865]: (null) Aug 27 19:27:31 node2 rgmanager[10973]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:28:28 node2 rgmanager[11148]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:28:33 node2 rgmanager[11226]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:30:58 node2 rgmanager[11587]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:31:03 node2 rgmanager[11665]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:31:06 node2 rgmanager[11733]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 19:36:58 node2 rgmanager[12495]: is not configured Aug 27 19:38:43 node2 rgmanager[12884]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 20:13:35 node2 rgmanager[16956]: Verifying Configuration Of default > Failed - Invalid Name Of Service Aug 27 20:16:11 node2 rgmanager[17717]: Adding IPv4 address 192.168.18.50/24 to eth0 Aug 27 20:16:14 node2 in.rdiscd[17784]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use Aug 27 20:16:14 node2 in.rdiscd[17784]: Failed joining addresses Aug 27 20:16:15 node2 rgmanager[17876]: Starting Service apache:WEB Aug 27 20:16:16 node2 rgmanager[17940]: Query failed: Invalid argument (/cluster/rm/service[@name="WEB"]/ip[2]/@address) Aug 27 20:17:31 node2 rgmanager[18737]: Stopping Service apache:WEB Aug 27 20:17:33 node2 rgmanager[18771]: Stopping Service apache:WEB > Failed - Application Is Still Running Aug 27 20:17:33 node2 rgmanager[18791]: Stopping Service apache:WEB > Failed Aug 27 20:17:33 node2 rgmanager[18840]: Removing IPv4 address 192.168.18.50/24 from eth0 [root@node2 cluster]# clusvcadm -e WEB -m node2.localdomain Member node2.localdomain trying to enable service:WEB...Aborted; service failed [root@node2 cluster]# tail /var/log/messages .. Aug 27 20:21:06 node2 rgmanager[1771]: #43: Service service:WEB has failed; can not start. Aug 27 20:21:06 node2 rgmanager[1771]: #13: Service service:WEB failed to stop cleanly
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster