On Fri, 27 Mar 2015, Michael Chapman wrote:
The destination libvirt daemon in a migration may segfault if the client disconnects immediately after the migration has begun: # virsh -c qemu+tls://remote/system list --all Id Name State ---------------------------------------------------- ... # timeout --signal KILL 1 \ virsh migrate example qemu+tls://remote/system \ --verbose --compressed --live --auto-converge \ --abort-on-error --unsafe --persistent \ --undefinesource --copy-storage-all --xml example.xml Killed # virsh -c qemu+tls://remote/system list --all error: failed to connect to the hypervisor error: unable to connect to server at 'remote:16514': Connection refused The crash is in: 1531 void 1532 qemuDomainObjEndJob(virQEMUDriverPtr driver, virDomainObjPtr obj) 1533 { 1534 qemuDomainObjPrivatePtr priv = obj->privateData; 1535 qemuDomainJob job = priv->job.active; 1536 1537 priv->jobs_queued--; Backtrace: #0 at qemuDomainObjEndJob at qemu/qemu_domain.c:1537 #1 in qemuDomainRemoveInactive at qemu/qemu_domain.c:2497 #2 in qemuProcessAutoDestroy at qemu/qemu_process.c:5646 #3 in virCloseCallbacksRun at util/virclosecallbacks.c:350 #4 in qemuConnectClose at qemu/qemu_driver.c:1154 ... qemuDomainRemoveInactive calls virDomainObjListRemove, which in this case is holding the last remaining reference to the domain. qemuDomainRemoveInactive then calls qemuDomainObjEndJob, but the domain object has been freed and poisoned by then. This patch bumps the domain's refcount until qemuDomainRemoveInactive has completed. We also ensure qemuProcessAutoDestroy does not return the domain to virCloseCallbacksRun to be unlocked in this case. There is similar logic in bhyveProcessAutoDestroy and lxcProcessAutoDestroy (which call virDomainObjListRemove directly).
Please ignore this. I've got a few other fixes related to cancelled VM migrations, so I'll send through all in one patch series.
- Michael -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list