Hi list,
(let me know if this should be on the xen list, but I think it's an
issue with clvm locking a logical volume)
I have a three node RHEL5 cluster running some virtual machines. The
virtual machines use a LVM LV as their root which is available cluster-
wide via clvmd.
Live migration between cluster nodes seems to work well when running
one-vm-per-node exclusively, but fails when a node is running more
than one virtual machine.
I can migrate my two VMs, "nodea" and "nodeb", onto the same physical
node and they run fine:
# xm list
Name ID Mem(MiB) VCPUs State
Time(s)
Domain-0 0 4120 4 r----- 3398.3
nodea 9 5999 1 -
b---- 0.3
nodeb 4 5999 1 -b----
265.9
However, when I try to migrate one of these VMs *away* from this
physical node to another cluster member (using clusvcadm -M), it
performs the state transfer and then I get a nasty error on the VMs
console and I end up with a broken virtual machine on both physical
nodes:
WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!
WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!
netif_release_rx_bufs: 0 xfer, 62 noxfer, 194 unused
WARNING: g.e. still in use!
WARNING: leaking g.e. and page still in use!
Sorry for the large email, but I'll also include the xend log on the
source physical server showing the failure. You can see that device
51712 is 'still active' while trying to migrate and that device 51712
is the LV block device; I assume this means it is having trouble
removing a CLVM lock?
[2008-08-24 00:27:43 xend 5252] DEBUG (balloon:127) Balloon: 26652 KiB
free; need 25600; done.
[2008-08-24 00:27:43 xend 5252] DEBUG (XendCheckpoint:89) [xc_save]: /
usr/lib64/xen/bin/xc_save 22 9 0 0 1
[2008-08-24 00:27:43 xend 5252] INFO (XendCheckpoint:351) ERROR
Internal error: Couldn't enable shadow mode
[2008-08-24 00:27:43 xend 5252] INFO (XendCheckpoint:351) Save exit rc=1
[2008-08-24 00:27:43 xend 5252] ERROR (XendCheckpoint:133) Save failed
on domain nodea (9).
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/xen/xend/
XendCheckpoint.py", line 110, in save
forkHelper(cmd, fd, saveInputHandler, False)
File "/usr/lib64/python2.4/site-packages/xen/xend/
XendCheckpoint.py", line 339, in forkHelper
raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib64/xen/bin/xc_save 22 9 0 0 1 failed
[2008-08-24 00:27:43 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo:
1601) XendDomainInfo.resumeDomain(9)
[2008-08-24 00:27:43 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo:
1614) XendDomainInfo.resumeDomain: devices released
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo:
791) Storing domain details: {'console/ring-ref': '2057005', 'console/
port': '2', 'name': 'migrating-nodea', 'console/limit': '1048576',
'vm': '/vm/b845f914-33a3-e1cf-551e-01b6d346b92b', 'domid': '9', 'cpu/
0/availability': 'online', 'memory/target': '6144000', 'store/ring-
ref': '2049294', 'store/port': '1'}
[2008-08-24 00:27:44 xend 5252] DEBUG (DevController:110)
DevController: writing {'backend-id': '0', 'mac': '00:16:3e:6c:ae:9f',
'handle': '0', 'state': '1', 'backend': '/local/domain/0/backend/vif/
9/0'} t
o /local/domain/9/device/vif/0.
[2008-08-24 00:27:44 xend 5252] DEBUG (DevController:112)
DevController: writing {'bridge': 'br102', 'domain': 'migrating-
nodea', 'handle': '0', 'script': '/etc/xen/scripts/vif-bridge',
'state': '1', 'fron
tend': '/local/domain/9/device/vif/0', 'mac': '00:16:3e:6c:ae:9f',
'online': '1', 'frontend-id': '9'} to /local/domain/0/backend/vif/9/0.
[2008-08-24 00:27:44 xend 5252] DEBUG (blkif:24) exception looking up
device number for xvda: [Errno 2] No such file or directory: '/dev/xvda'
[2008-08-24 00:27:44 xend 5252] DEBUG (DevController:110)
DevController: writing {'backend-id': '0', 'virtual-device': '51712',
'device-type': 'disk', 'state': '1', 'backend': '/local/domain/0/
backend/vbd/
9/51712'} to /local/domain/9/device/vbd/51712.
[2008-08-24 00:27:44 xend 5252] DEBUG (DevController:112)
DevController: writing {'domain': 'migrating-nodea', 'frontend': '/
local/domain/9/device/vbd/51712', 'format': 'raw', 'dev': 'xvda',
'state': '1',
'params': '/dev/int_vg/os_nodea', 'mode': 'w', 'online': '1',
'frontend-id': '9', 'type': 'phy'} to /local/domain/0/backend/vbd/
9/51712.
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] DEBUG (XendDomainInfo:
1626) XendDomainInfo.resumeDomain: devices created
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] ERROR (XendDomainInfo:
1631) XendDomainInfo.resume: xc.domain_resume failed on domain 9.
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/xen/xend/
XendDomainInfo.py", line 1628, in resumeDomain
xc.domain_resume(self.domid, fast)
Error: (1, 'Internal error', "Couldn't map start_info")
[2008-08-24 00:27:44 xend 5252] DEBUG (XendCheckpoint:136)
XendCheckpoint.save: resumeDomain
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:44 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
[2008-08-24 00:27:45 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1722) Dev 51712 still active, looping...
-------many repeats-------
[2008-08-24 00:28:14 xend.XendDomainInfo 5252] INFO (XendDomainInfo:
1728) Dev still active but hit max loop timeout
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster