Problem with Xen4CentOS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

we (the company i am working for) are running several dozens of virtualisation servers using CentOS 6 + Xen4CentOS as the virtualisation infrastructure.

With the latest versions of all packages installed [1], we see failures in live-migration of stock-CentOS6 HVM guests, leaving a "Domain-unnamed" on the source host, while the migrated guest runs fine on the target host.

Domain-0                                     0  2048    64 r----- 6791.5
Domain-Unnamed                               1  4099     4 --ps--     94.8

The failure is not consistently reproducable, some guests (of the same type) live-migrate just fine, until eventually some seemingly random guest fails, leaving a "Domain-unnamed" Zombie.

That Domain-unnamed causes several problems:
- The memory allocated to Domain-unnamed remains blocked, thus creating a veritable 'memory-leak' to the host - The DomU causing Domain-unnamed cannot be restarted on the host, as xm thinks it's already running

I have tried various things to get rid of Domain-unnamed, all without success
- multiple xm destroy
- restart xend
- delete everything regarding Dommain-unnamed in xenstore with xenstore-rm. The removal is successful, but the domain remains. Restarting xend after the deletion restores Domain-unnamed in xenstore

So far, the only way to get rid of Domain-unnamed is a virt-host reboot. As these hosts are all quad-socket opteron 6272 machines with 256gig ram running dozens of guests, this is highly impractical.

I have seen this behaviour using xen 4.2.5. The previous 4.2.4 versions did not show this problem, however we did not use live-migration extensively prior to that. Before switching to Xen4CentOS, we used to build our own Xen 4.2.2 based on a git repo, published by Karanbir Singh. We had several issues with that version, but never observed a "Domain-unnamed".

Any idea how to resolve this issue would be highly appreciated, as working live-migration is crucial to us.

Regards,
Thomas Weyergraf

Some notes on our config:

1. We still use xm/xend for various reasons
----
2. Our grub-config for the virtx-hosts is as follows:
----
default=0
timeout=5
#splashimage=(hd0,0)/grub/splash.xpm.gz
#hiddenmenu
title CentOS (xen-4.2.5-37.el6.gz vmlinuz-3.10.56-11.el6.centos.alt.x86_64)
        root (hd0,0)
kernel /xen-4.2.5-37.el6.gz iommu=1 console=vga,com1 com1=115200,8n1 vga=text-80x25 dom0_mem=2048M,max:2048M module /vmlinuz-3.10.56-11.el6.centos.alt.x86_64 ro xencons=hvc0 console=hvc0 root=/dev/fravirtx68/root rd_NO_LUKS LANG=en_US.UTF-8 KEYBOARDTYPE=pc KEYTABLE=de-latin1-nodeadkeys rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=fravirtx68/root rd_NO_DM
        module /initramfs-3.10.56-11.el6.centos.alt.x86_64.img
title CentOS (vmlinuz-3.10.56-11.el6.centos.alt.x86_64)
        root (hd0,0)
kernel /vmlinuz-3.10.56-11.el6.centos.alt.x86_64 ro root=/dev/fravirtx68/root rd_NO_LUKS LANG=en_US.UTF-8 KEYBOARDTYPE=pc KEYTABLE=de-latin1-nodeadkeys rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=fravirtx68/root rd_NO_DM
        module /initramfs-3.10.56-11.el6.centos.alt.x86_64.img

3. A typical guest-config looks like:
----
name = "fraappmgmt05t.test.fra.net-m.internal"
uuid = "3778a443-9194-4c46-adff-211d7fcc24da"
memory = "4096"
vcpus = 4
kernel = "hvmloader"
builder = 'hvm'
disk = [ 'phy:/dev/disk/by-path/ip-192.168.240.7:3260-iscsi-iqn.1992-08.com.netapp:navfiler21-lun-56,xvda,w', 'phy:/dev/disk/by-path/ip-192.168.240.7:3260-iscsi-iqn.1992-08.com.netapp:navfiler21-lun-57,xvdb,w', ]
vif = [ 'mac=00:16:3e:fa:15:4a,bridge=xenbr11' ]
device_model = 'qemu-dm'
serial='pty'
xen_platform_pci=1
on_poweroff = "destroy"
on_crash = "restart"

4. The xend.log excerpt of the migration process from the source host:
----
[2014-11-06 22:53:01 13499] DEBUG (XendDomainInfo:1795) Storing domain details: {'console/port': '7', 'cpu/3/availability': 'online', 'description': '', 'console/limit': '1048576', 'cpu/2/availability': 'online', 'vm': '/vm/f5139575-984b-4c28-b470-efc042ba2703', 'domid': '1', 'store/port': '6', 'console/type': 'ioemu', 'cpu/0/availability': 'online', 'memory/target': '4194304', 'control/platform-feature-multiprocessor-suspend': '1', 'store/ring-ref': '1044476', 'cpu/1/availability': 'online', 'control/platform-feature-xs_reset_watches': '1', 'image/suspend-cancel': '1', 'name': 'migrating-fraapppeccon06.fra.net-m.internal'} [2014-11-06 22:53:01 13499] INFO (XendCheckpoint:423) xc_save: failed to get the suspend evtchn port
[2014-11-06 22:53:01 13499] INFO (XendCheckpoint:423)
[2014-11-06 22:53:34 13499] DEBUG (XendCheckpoint:394) suspend
[2014-11-06 22:53:34 13499] DEBUG (XendCheckpoint:127) In saveInputHandler suspend
[2014-11-06 22:53:34 13499] DEBUG (XendCheckpoint:129) Suspending 1 ...
[2014-11-06 22:53:34 13499] DEBUG (XendDomainInfo:524) XendDomainInfo.shutdown(suspend) [2014-11-06 22:53:34 13499] DEBUG (XendDomainInfo:1882) XendDomainInfo.handleShutdownWatch [2014-11-06 22:53:34 13499] DEBUG (XendDomainInfo:1882) XendDomainInfo.handleShutdownWatch [2014-11-06 22:53:34 13499] INFO (XendDomainInfo:2079) Domain has shutdown: name=migrating-fraapppeccon06.fra.net-m.internal id=1 reason=suspend.
[2014-11-06 22:53:34 13499] INFO (XendCheckpoint:135) Domain 1 suspended.
[2014-11-06 22:53:35 13499] INFO (image:542) signalDeviceModel:restore dm state to running
[2014-11-06 22:53:35 13499] DEBUG (XendCheckpoint:144) Written done
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:3077) XendDomainInfo.destroy: domid=1 [2014-11-06 22:53:35 13499] ERROR (XendDomainInfo:3091) XendDomainInfo.destroy: domain destruction failed.
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 3086, in destroy
    xc.domain_destroy(self.domid)
Error: (16, 'Device or resource busy')
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:2402) Destroying device model [2014-11-06 22:53:35 13499] INFO (image:619) migrating-fraapppeccon06.fra.net-m.internal device model terminated
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:2409) Releasing devices
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:2415) Removing vif/0
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:1276) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:2415) Removing console/0
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:1276) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:2415) Removing vbd/51712
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:1276) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51712
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:2415) Removing vbd/51728
[2014-11-06 22:53:35 13499] DEBUG (XendDomainInfo:1276) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51728 [2014-11-06 22:53:36 13499] DEBUG (XendCheckpoint:124) [xc_save]: /usr/lib/xen/bin/xc_save 26 2 0 0 5
_______________________________________________
CentOS-virt mailing list
CentOS-virt@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos-virt




[Index of Archives]     [CentOS Users]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [X.org]     [Xfree86]     [Linux USB]

  Powered by Linux