On Fri, Apr 07, 2017 at 02:12:31PM +0200, Kashyap Chamarthy wrote: > On Fri, Apr 07, 2017 at 08:22:01AM +0200, Jiri Denemark wrote: > > On Thu, Apr 06, 2017 at 18:14:07 +0200, Kashyap Chamarthy wrote: > > > [Filed this bug -- https://bugzilla.redhat.com/show_bug.cgi?id=1439841] > > > > > > Easy reproducer: > > > > > > $ virsh migrate --verbose --copy-storage-all \ > > > --p2p --live l2-f25 qemu+ssh://root@devstack-a/system > > > error: invalid argument: monitor must not be NULL > > > > This is caused by the TLS migration code and most likely fixed by > > https://www.redhat.com/archives/libvir-list/2017-April/msg00219.html > > Thanks. I'll test with your series & report back on that thread. [Since the above series is pushed, responding here.] I just built (RPMs) from libvirt Git, which has the above series ("qemu: Properly reset all migration capabilities"). I was here when I tested it: $ git describe v3.2.0-80-gbe193c4 I did two tests (same reproducer command-line as above): (Test-1) Migrate a guest from source to destination: Result: Succeeds (the migrated guest successfully runs on the destination) (Test-2) Once 'Test-1' finished successfully, and the guest is running successfully on the destination, migrate it back to source: Result: Fails. $ virsh migrate --verbose --copy-storage-all \ --p2p --live l2-f25 qemu+ssh://root@l1-f25/system error: operation failed: migration job: is not active Looking at the source debug log (URLs to complete logs further below), I see the dreaded "cannot acquire state change lock" error. [...] 2017-04-10 06:29:23.322+0000: 22676: warning : qemuDomainObjBeginJobInternal:3607 : Cannot start job (modify, none) for domain l2-f25; current job is (none, migration out) owned by (0 <null> , 16698 remoteDispatchDomainMigratePerform3Params) for (0s, 96s) 2017-04-10 06:29:23.322+0000: 22676: error : qemuDomainObjBeginJobInternal:3619 : Timed out during operation: cannot acquire state change lock (held by +remoteDispatchDomainMigratePerform3Params) [...] 2017-04-10 06:31:57.525+0000: 16698: error : qemuMigrationCheckJobStatus:1420 : operation failed: migration job: is not active 2017-04-10 06:31:57.525+0000: 16698: debug : qemuMigrationCancelDriveMirror:785 : Cancelling drive mirrors for domain l2-f25 [...] 2017-04-10 06:31:57.538+0000: 16698: debug : qemuMigrationDriveMirrorCancelled:700 : All disk mirrors are gone 2017-04-10 06:31:57.538+0000: 16698: debug : doPeer2PeerMigrate3:4428 : Finish3 0x7f39d801e3d0 ret=-1 2017-04-10 06:31:57.539+0000: 16698: debug : qemuDomainObjEnterRemote:3918 : Entering remote (vm=0x563b26a60e60 name=l2-f25) 2017-04-10 06:31:57.783+0000: 16698: error : virNetClientProgramDispatchError:177 : migration successfully aborted [...] Complete libvirt debug logs (with appropriate log filters): - libvirtd debug log of source host (after a failed migration from destination to source) -- https://bugzilla.redhat.com/attachment.cgi?id=1270407 - libvirtd debug log of destination host (after a failed migration from destination to source) -- https://bugzilla.redhat.com/attachment.cgi?id=1270406 -- /kashyap -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list