Problem with service migration with xen domU on diferent dom0 with redhat 5.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Sir / Madame:

I am implementing a two node cluster on domU providing apache service "webby" we have them on different dom0. This apache service also are load balancing a JBoss virtual machines but them are not part of the cluster, also I have configured a virtual machine with iscsi target to provide a shared quorum disk so our quorum is 2 votes from 3.

The first thing that I noticed is that when I finished configuring the cluster with luci the service webby does not start automatically. I have to enable the service and them it started.

Initially I had a problem with the xvm_fence. When I configured in dom0 an individual cluster and start cman on dom0 it used to start fence_xvmd but in one place I read that dom0 had to be in anothe cluster so I created anothe cluster with both dom0, but now they are not starting the fence_xvmd. That is why I am using fence_xvmd as a standalone with this config: fence_xvmd -LX -a 225.0.0.1 -I eth3

When I try from the domU to fence from command line it worked I use the command:

fence_xvm -a 225.0.0.1 -I eth1 -H frederick -ddd -o null

and produced:

Waiting for connection from XVM host daemon.
Issuing TCP challenge
Responding to TCP challenge
TCP Exchange + Authentication done...
Waiting for return value from XVM host
Remote: Operation failed

In luci I configured the multicast address 225.0.0.1 and interface eth1 for cluster on domU and
multicast address 225.0.0.1 and interface eth3 on dom0 by CLI

Perhaps the problem I have is for the keys. I use one key that is shared between dom0 and domU on server1 and another key that is also shared between dom0 and domU on server2. Also on server1 I copied the key
fence_xvm.key as fence_xvm-host1.key and distibuted to the other domU and both dom0. on server2 I copied the key fence_xvm.key as fence_xvm-host2.key and distibuted to the the other domU and both dom0

My cluster config is the following:

<?xml version="1.0"?>
<cluster alias="clusterapache01" config_version="52" name="clusterapache01">
    <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="60"/>
    <clusternodes>
        <clusternode name="172.19.52.121" nodeid="1" votes="1">
            <fence>
                <method name="1">
                    <device domain="vmapache1" name="xenfence1"/>
                </method>
            </fence>
            <multicast addr="225.0.0.1" interface="eth1"/>
        </clusternode>
        <clusternode name="172.19.52.122" nodeid="2" votes="1">
            <fence>
                <method name="1">
                    <device domain="vmapache2" name="xenfence2"/>
                </method>
            </fence>
            <multicast addr="225.0.0.1" interface="eth1"/>
        </clusternode>
    </clusternodes>
    <cman expected_votes="3">
        <multicast addr="225.0.0.1"/>
    </cman>
    <fencedevices>
        <fencedevice agent="fence_xvm" key_file="/etc/cluster/fence_xvm-host1.key" name="xenfence1"/>
        <fencedevice agent="fence_xvm" key_file="/etc/cluster/fence_xvm-host2.key" name="xenfence2"/>
    </fencedevices>
    <rm log_level="7">
        <failoverdomains>
            <failoverdomain name="prefer_node1" nofailback="1" ordered="1" restricted="1">
                <failoverdomainnode name="172.19.52.121" priority="1"/>
                <failoverdomainnode name="172.19.52.122" priority="2"/>
            </failoverdomain>
        </failoverdomains>
        <resources>
            <ip address="172.19.52.120" monitor_link="1"/>
            <apache config_file="conf/httpd.conf" name="httpd" server_root="/etc/httpd" shutdown_wait="0"/>
            <netfs export="/data" force_unmount="0" fstype="nfs4" host="172.19.50.114" mountpoint="/var/www/html" name="htdoc" options="rw,no_root_squash"/>
        </resources>
        <service autostart="1" domain="prefer_node1" exclusive="0" name="webby" recovery="relocate">
            <ip ref="172.19.52.120"/>
            <apache ref="httpd"/>
        </service>
    </rm>
    <fence_xvmd/>
    <totem consensus="4800" join="60" token="10000" token_retransmits_before_loss_const="20"/>
    <quorumd device="/dev/sda1" interval="2" min_score="1" tko="10" votes="1">
        <heuristic interval="2" program="ping -c1 -t1 172.19.52.119" score="1"/>
    </quorumd>
</cluster>

Another strange thing is when I do a clustat on vmapache1 it recognizes the webby service as started on vmapache1and both nodes and quorumdisk online but on vmapache clustat only shows both nodes and the quorumdisk online, nothing abour any service.

This is the log when I tried to make a migration:
Apr 22 21:39:14 vmapache01 ccsd[2183]: Update of cluster.conf complete (version 51 -> 52).
Apr 22 21:39:23 vmapache01 clurgmgrd[2331]: <notice> Reconfiguring
Apr 22 21:39:23 vmapache01 clurgmgrd[2331]: <info> Loading Service Data
Apr 22 21:39:23 vmapache01 clurgmgrd[2331]: <info> Applying new configuration #52
Apr 22 21:39:23 vmapache01 clurgmgrd[2331]: <info> Stopping changed resources.
Apr 22 21:39:23 vmapache01 clurgmgrd[2331]: <info> Restarting changed resources.
Apr 22 21:39:23 vmapache01 clurgmgrd[2331]: <info> Starting changed resources.
Apr 22 21:40:07 vmapache01 clurgmgrd[2331]: <notice> Stopping service service:webby
Apr 22 21:40:07 vmapache01 clurgmgrd: [2331]: <info> Stopping Service apache:httpd
Apr 22 21:40:07 vmapache01 clurgmgrd: [2331]: <err> Checking Existence Of File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed - File Doesn't Exist
Apr 22 21:40:07 vmapache01 clurgmgrd: [2331]: <info> Stopping Service apache:httpd > Succeed
Apr 22 21:40:07 vmapache01 clurgmgrd[2331]: <notice> Service service:webby is disabled
Apr 22 21:40:07 vmapache01 clurgmgrd[2331]: <notice> Starting disabled service service:webby
Apr 22 21:40:08 vmapache01 clurgmgrd: [2331]: <info> Adding IPv4 address 172.19.52.120/24 to eth0
Apr 22 21:40:09 vmapache01 clurgmgrd: [2331]: <info> Starting Service apache:httpd
Apr 22 21:40:09 vmapache01 clurgmgrd[2331]: <notice> Service service:webby started
Apr 22 21:43:29 vmapache01 qdiskd[5855]: <info> Quorum Daemon Initializing
Apr 22 21:43:30 vmapache01 qdiskd[5855]: <info> Heuristic: 'ping -c1 -t1 172.19.52.119' UP
Apr 22 21:43:49 vmapache01 qdiskd[5855]: <info> Initial score 1/1
Apr 22 21:43:49 vmapache01 qdiskd[5855]: <info> Initialization complete
Apr 22 21:43:49 vmapache01 openais[2189]: [CMAN ] quorum device registered
Apr 22 21:43:49 vmapache01 qdiskd[5855]: <notice> Score sufficient for master operation (1/1; required=1); upgrading
Apr 22 21:44:13 vmapache01 qdiskd[5855]: <info> Assuming master role
Apr 22 21:47:31 vmapache01 clurgmgrd[2331]: <notice> Stopping service service:webby
Apr 22 21:47:31 vmapache01 clurgmgrd: [2331]: <info> Stopping Service apache:httpd
Apr 22 21:47:33 vmapache01 clurgmgrd: [2331]: <err> Stopping Service apache:httpd > Failed - Application Is Still Running
Apr 22 21:47:33 vmapache01 clurgmgrd: [2331]: <err> Stopping Service apache:httpd > Failed
Apr 22 21:47:33 vmapache01 clurgmgrd[2331]: <notice> stop on apache "httpd" returned 1 (generic error)
Apr 22 21:47:33 vmapache01 clurgmgrd: [2331]: <info> Removing IPv4 address 172.19.52.120/24 from eth0
Apr 22 21:47:43 vmapache01 clurgmgrd[2331]: <crit> #12: RG service:webby failed to stop; intervention required
Apr 22 21:47:43 vmapache01 clurgmgrd[2331]: <notice> Service service:webby is failed
Apr 22 21:47:43 vmapache01 clurgmgrd[2331]: <warning> #70: Failed to relocate service:webby; restarting locally
Apr 22 21:47:43 vmapache01 clurgmgrd[2331]: <err> #43: Service service:webby has failed; can not start.
Apr 22 21:47:43 vmapache01 clurgmgrd[2331]: <alert> #2: Service service:webby returned failure code.  Last Owner: 172.19.52.121
Apr 22 21:47:43 vmapache01 clurgmgrd[2331]: <alert> #4: Administrator intervention required.
Apr 22 21:50:31 vmapache01 clurgmgrd[2331]: <notice> Stopping service service:webby
Apr 22 21:50:31 vmapache01 clurgmgrd: [2331]: <info> Stopping Service apache:httpd
Apr 22 21:50:31 vmapache01 clurgmgrd: [2331]: <err> Checking Existence Of File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed - File Doesn't Exist
Apr 22 21:50:31 vmapache01 clurgmgrd: [2331]: <info> Stopping Service apache:httpd > Succeed
Apr 22 21:50:31 vmapache01 clurgmgrd[2331]: <notice> Service service:webby is disabled
Apr 22 21:50:31 vmapache01 clurgmgrd[2331]: <notice> Starting disabled service service:webby
Apr 22 21:50:31 vmapache01 clurgmgrd: [2331]: <info> Adding IPv4 address 172.19.52.120/24 to eth0
Apr 22 21:50:32 vmapache01 clurgmgrd: [2331]: <info> Starting Service apache:httpd
Apr 22 21:50:33 vmapache01 clurgmgrd[2331]: <notice> Service service:webby started
Apr 22 21:50:50 vmapache01 clurgmgrd[2331]: <notice> Stopping service service:webby
Apr 22 21:50:51 vmapache01 clurgmgrd: [2331]: <info> Stopping Service apache:httpd
Apr 22 21:50:52 vmapache01 clurgmgrd: [2331]: <err> Stopping Service apache:httpd > Failed - Application Is Still Running
Apr 22 21:50:52 vmapache01 clurgmgrd: [2331]: <err> Stopping Service apache:httpd > Failed
Apr 22 21:50:52 vmapache01 clurgmgrd[2331]: <notice> stop on apache "httpd" returned 1 (generic error)
Apr 22 21:50:52 vmapache01 clurgmgrd: [2331]: <info> Removing IPv4 address 172.19.52.120/24 from eth0
Apr 22 21:51:02 vmapache01 clurgmgrd[2331]: <crit> #12: RG service:webby failed to stop; intervention required
Apr 22 21:51:02 vmapache01 clurgmgrd[2331]: <notice> Service service:webby is failed
Apr 22 21:51:02 vmapache01 clurgmgrd[2331]: <warning> #70: Failed to relocate service:webby; restarting locally
Apr 22 21:51:02 vmapache01 clurgmgrd[2331]: <err> #43: Service service:webby has failed; can not start.
Apr 22 21:51:02 vmapache01 clurgmgrd[2331]: <alert> #2: Service service:webby returned failure code.  Last Owner: 172.19.52.121
Apr 22 21:51:02 vmapache01 clurgmgrd[2331]: <alert> #4: Administrator intervention required.
Apr 22 21:52:41 vmapache01 clurgmgrd[2331]: <notice> Stopping service service:webby
Apr 22 21:52:41 vmapache01 clurgmgrd: [2331]: <info> Stopping Service apache:httpd
Apr 22 21:52:41 vmapache01 clurgmgrd: [2331]: <err> Checking Existence Of File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed - File Doesn't Exist

If you see something wrong let me know, Any help or ideas will be appreciated.

Best regards,




 

-----------------------------------------
Carlos Vermejo Ruiz
-------------------------------------------
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux