Virtual service using CLVM not migrating

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,

(let me know if this should be on the xen list, but I think it's an issue with clvm locking a logical volume)

I have a three node RHEL5 cluster running some virtual machines. The virtual machines use a LVM LV as their root which is available cluster- wide via clvmd.

Live migration between cluster nodes seems to work well when running one-vm-per-node exclusively, but fails when a node is running more than one virtual machine.

I can migrate my two VMs, "nodea" and "nodeb", onto the same physical node and they run fine:

# xm list
Name ID Mem(MiB) VCPUs State Time(s)
Domain-0                                0     4120     4 r-----   3398.3
nodea 9 5999 1 - b---- 0.3 nodeb 4 5999 1 -b---- 265.9


However, when I try to migrate one of these VMs *away* from this physical node to another cluster member (using clusvcadm -M), it performs the state transfer and then I get a nasty error on the VMs console and I end up with a broken virtual machine on both physical nodes:

WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!
WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!
netif_release_rx_bufs: 0 xfer, 62 noxfer, 194 unused
WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!


Sorry for the large email, but I'll also include the xend log on the source physical server showing the failure. You can see that device 51712 is 'still active' while trying to migrate and that device 51712 is the LV block device; I assume this means it is having trouble removing a CLVM lock?


[2008-08-24 00:27:43 xend 5252] DEBUG (balloon:127) Balloon: 26652 KiB free; need 25600; done. [2008-08-24 00:27:43 xend 5252] DEBUG (XendCheckpoint:89) [xc_save]: / usr/lib64/xen/bin/xc_save 22 9 0 0 1 [2008-08-24 00:27:43 xend 5252] INFO (XendCheckpoint:351) ERROR Internal error: Couldn't enable shadow mode
[2008-08-24 00:27:43 xend 5252] INFO (XendCheckpoint:351) Save exit rc=1
[2008-08-24 00:27:43 xend 5252] ERROR (XendCheckpoint:133) Save failed on domain nodea (9).
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/xen/xend/ XendCheckpoint.py", line 110, in save
    forkHelper(cmd, fd, saveInputHandler, False)
File "/usr/lib64/python2.4/site-packages/xen/xend/ XendCheckpoint.py", line 339, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib64/xen/bin/xc_save 22 9 0 0 1 failed
[2008-08-24 00:27:43 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo: 1601) XendDomainInfo.resumeDomain(9) [2008-08-24 00:27:43 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo: 1614) XendDomainInfo.resumeDomain: devices released [2008-08-24 00:27:44 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo: 791) Storing domain details: {'console/ring-ref': '2057005', 'console/ port': '2', 'name': 'migrating-nodea', 'console/limit': '1048576', 'vm': '/vm/b845f914-33a3-e1cf-551e-01b6d346b92b', 'domid': '9', 'cpu/ 0/availability': 'online', 'memory/target': '6144000', 'store/ring- ref': '2049294', 'store/port': '1'} [2008-08-24 00:27:44 xend 5252] DEBUG (DevController:110) DevController: writing {'backend-id': '0', 'mac': '00:16:3e:6c:ae:9f', 'handle': '0', 'state': '1', 'backend': '/local/domain/0/backend/vif/ 9/0'} t
o /local/domain/9/device/vif/0.
[2008-08-24 00:27:44 xend 5252] DEBUG (DevController:112) DevController: writing {'bridge': 'br102', 'domain': 'migrating- nodea', 'handle': '0', 'script': '/etc/xen/scripts/vif-bridge', 'state': '1', 'fron tend': '/local/domain/9/device/vif/0', 'mac': '00:16:3e:6c:ae:9f', 'online': '1', 'frontend-id': '9'} to /local/domain/0/backend/vif/9/0. [2008-08-24 00:27:44 xend 5252] DEBUG (blkif:24) exception looking up device number for xvda: [Errno 2] No such file or directory: '/dev/xvda' [2008-08-24 00:27:44 xend 5252] DEBUG (DevController:110) DevController: writing {'backend-id': '0', 'virtual-device': '51712', 'device-type': 'disk', 'state': '1', 'backend': '/local/domain/0/ backend/vbd/
9/51712'} to /local/domain/9/device/vbd/51712.
[2008-08-24 00:27:44 xend 5252] DEBUG (DevController:112) DevController: writing {'domain': 'migrating-nodea', 'frontend': '/ local/domain/9/device/vbd/51712', 'format': 'raw', 'dev': 'xvda', 'state': '1', 'params': '/dev/int_vg/os_nodea', 'mode': 'w', 'online': '1', 'frontend-id': '9', 'type': 'phy'} to /local/domain/0/backend/vbd/ 9/51712. [2008-08-24 00:27:44 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo: 1626) XendDomainInfo.resumeDomain: devices created [2008-08-24 00:27:44 xend.XendDomainInfo 5252] ERROR (XendDomainInfo: 1631) XendDomainInfo.resume: xc.domain_resume failed on domain 9.
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/xen/xend/ XendDomainInfo.py", line 1628, in resumeDomain
    xc.domain_resume(self.domid, fast)
Error: (1, 'Internal error', "Couldn't map start_info")
[2008-08-24 00:27:44 xend 5252] DEBUG (XendCheckpoint:136) XendCheckpoint.save: resumeDomain [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping... [2008-08-24 00:27:45 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1722) Dev 51712 still active, looping...
-------many repeats-------
[2008-08-24 00:28:14 xend.XendDomainInfo 5252] INFO (XendDomainInfo: 1728) Dev still active but hit max loop timeout


--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux