On Thursday 18 October 2007 22:00:33 Lon Hohberger wrote: > On Wed, 2007-10-17 at 11:41 +0200, Marc Grimme wrote: > > Hello, > > we are currently discussing XEN with clustering support. > > There came some questions we are not sure what the answer is. Perhaps you > > can help ;-) . > > > > Background is: We are discussing a group of XEN Dom0 Hosts sharing all > > devices and files via GFS. They themselves again host a couple of > > virtually redhat-clustered DomU Hosts with or without gfs. > > > > 1. Live Migration of cluster DomU nodes: > > When I live migrate a virtual DomU clusternode to another DOM0 XEN Host > > the migration works ;-) , but the virtual clusternode is thrown out of > > the cluster. Is this a "works as designed"? I think the problem are the > > heartbeats not coming in proper time. > > Does that lead to the conclusion that one cannot live migrate cluster > > nodes? > > Depends. If you're using rgmanager to do migration, the migration is > actually not live. In order to do live migration, > change /usr/share/cluster/vm.sh... > > - where it says 'xm migrate ...' > - change it to 'xm migrate -l ...' Ok got it. Still did you try to live migrate a cluster node? > > That should enable live migration. > > > 2. Fencing: > > How about fencing of the virtual Dom-U Clusternodes. You are never sure > > on which Dom-0 Node runs our Dom-U Clusternode. Is the fencing via > > fence_xvm[d] supported on such an environment? That means how does a > > virtual DomU clusternode X running on Dom0 Xen Host x know that if > > virtual DomU clusternode Y running on Dom0 Xen Host y is running there > > when it is getting the fence request to fence Host y where it is running? > > Yes. Fence_xvmd is designed (specifically) to handle the case where the > dom0 hosting a particular domU is not known. Note that this only works > on RHEL5 with openais and such; fence_xvmd uses AIS checkpoints to store > virtual machine locations. > > Notes: > * the parent dom0 cluster still needs fencing, too :) Yes. Thats in place. Check. > * do not mix domU and dom0 in the same cluster, I didn't. Check. > * all domUs within a dom0 cluster must have different domain names, Ups. hostname -d on dom0 and hostname -d on domu need to be different? What if they are empty? Or do you mean some other domainname? Dom0: [root@axqa01_2 ~]# hostname -d [root@axqa01_2 ~]# DomU: [root@axqa03_1 ~]# hostname -d cc.atix > * do *not* reuse /etc/xen/fence_xvm.key between multiple dom0 clusters I just did not use it. Dom0: [root@axqa01_2 ~]# ps ax | grep [f]ence_xvmd 1932 pts/1 S+ 0:00 fence_xvmd -ddddd -f -c none -C none So on axqa01_2 runs axqa03_2 and on axqa01_1 runs axqa03_1 Then when I do a ./fence_xvm -ddddd -C none -c none -H axqa03_2 on axqa03_1 I get the following: Waiting for response Received 264 bytes Adding IP 127.0.0.1 to list (family 2) Adding IP 10.1.2.1 to list (family 2) Adding IP 192.168.10.40 to list (family 2) Adding IP 192.168.122.1 to list (family 2) Closing Netlink connection ipv4_listen: Setting up ipv4 listen socket ipv4_listen: Success; fd = 3 Setting up ipv4 multicast send (225.0.0.12:1229) Joining IP Multicast group (pass 1) Joining IP Multicast group (pass 2) Setting TTL to 2 for fd4 ipv4_send_sk: success, fd = 4 sign_request: no-op (HASH_NONE) Sending to 225.0.0.12 via 127.0.0.1 Setting up ipv4 multicast send (225.0.0.12:1229) Joining IP Multicast group (pass 1) Joining IP Multicast group (pass 2) Setting TTL to 2 for fd4 ipv4_send_sk: success, fd = 4 sign_request: no-op (HASH_NONE) Sending to 225.0.0.12 via 10.1.2.1 Setting up ipv4 multicast send (225.0.0.12:1229) Joining IP Multicast group (pass 1) Joining IP Multicast group (pass 2) Setting TTL to 2 for fd4 ipv4_send_sk: success, fd = 4 sign_request: no-op (HASH_NONE) Sending to 225.0.0.12 via 192.168.10.40 Setting up ipv4 multicast send (225.0.0.12:1229) Joining IP Multicast group (pass 1) Joining IP Multicast group (pass 2) Setting TTL to 2 for fd4 ipv4_send_sk: success, fd = 4 sign_request: no-op (HASH_NONE) Sending to 225.0.0.12 via 192.168.122.1 Waiting for connection from XVM host daemon. Issuing TCP challenge tcp_challenge: no-op (AUTH_NONE) Responding to TCP challenge tcp_response: no-op (AUTH_NONE) TCP Exchange + Authentication done... Waiting for return value from XVM host Remote: Operation failed on axqa01_2: ------ ---- ----- ----- axqa03_2 cb165cce-1798-daf9-1252-12a2347a9fc7 00002 00002 Domain-0 00000000-0000-0000-0000-000000000000 00002 00001 Storing axqa03_2 libvir: Xen Daemon error : GET operation failed: Domain UUID Owner State ------ ---- ----- ----- axqa03_2 cb165cce-1798-daf9-1252-12a2347a9fc7 00002 00002 Domain-0 00000000-0000-0000-0000-000000000000 00002 00001 Storing axqa03_2 Request to fence: axqa03_2 axqa03_2 is running locally Plain TCP request libvir: Xen Daemon error : GET operation failed: libvir: error : invalid argument in __virGetDomain libvir: Xen Store error : out of memory tcp_response: no-op (AUTH_NONE) tcp_challenge: no-op (AUTH_NONE) Rebooting domain axqa03_2... [[ XML Domain Info ]] <domain type='xen'> <name>axqa03_2</name> <uuid>1732aae45a110676113df9e7da458b61</uuid> <os> <type>linux</type> <kernel>/var/lib/xen/boot/vmlinuz-2.6.18-52.el5xen</kernel> <initrd>/var/lib/xen/boot/initrd_sr-2.6.18-52.el5xen.img</initrd> </os> <currentMemory>366592</currentMemory> <memory>366592</memory> <vcpu>2</vcpu> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <disk type='block' device='disk'> <driver name='phy'/> <source dev='sds'/> <target dev='sds'/> </disk> <disk type='file' device='disk'> <driver name='file'/> <source file='/var/lib/xen/images/axqa03_2.localdisk.dd'/> <target dev='sda'/> </disk> <interface type='bridge'> <mac address='aa:00:00:00:00:12'/> <source bridge='xenbr0'/> </interface> <interface type='bridge'> <mac address='00:16:3e:43:90:d2'/> <source bridge='xenbr1'/> </interface> <console/> </devices> </domain> [[ XML END ]] Virtual machine is Linux Unlinkiking os block [[ XML Domain Info (modified) ]] <?xml version="1.0"?> <domain type="xen"> <name>axqa03_2</name> <uuid>1732aae45a110676113df9e7da458b61</uuid> <currentMemory>366592</currentMemory> <memory>366592</memory> <vcpu>2</vcpu> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <disk type="block" device="disk"> <driver name="phy"/> <source dev="sds"/> <target dev="sds"/> </disk> <disk type="file" device="disk"> <driver name="file"/> <source file="/var/lib/xen/images/axqa03_2.localdisk.dd"/> <target dev="sda"/> </disk> <interface type="bridge"> <mac address="aa:00:00:00:00:12"/> <source bridge="xenbr0"/> </interface> <interface type="bridge"> <mac address="00:16:3e:43:90:d2"/> <source bridge="xenbr1"/> </interface> <console/> </devices> </domain> [[ XML END ]] [REBOOT] Calling virDomainDestroy virDomainDestroy() failed: -1 Sending response to caller... libvir: Xen Daemon error : GET operation failed: Domain UUID Owner State ------ ---- ----- ----- axqa03_2 cb165cce-1798-daf9-1252-12a2347a9fc7 00002 00002 Domain-0 00000000-0000-0000-0000-000000000000 00002 00001 Storing axqa03_2 on axqa01_1: Domain UUID Owner State ------ ---- ----- ----- axqa03_1 8f89affa-4330-d281-9622-98665e4816c2 00001 00002 Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 Storing axqa03_1 Domain UUID Owner State ------ ---- ----- ----- axqa03_1 8f89affa-4330-d281-9622-98665e4816c2 00001 00002 Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 Storing axqa03_1 Request to fence: axqa03_2 Evaluating Domain: axqa03_2 Last Owner: 2 State 2 Domain UUID Owner State ------ ---- ----- ----- axqa03_1 8f89affa-4330-d281-9622-98665e4816c2 00001 00002 Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 Storing axqa03_1 Domain UUID Owner State Any ideas? Marc. > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster