Thanks at first, this patch some kinda solve my problem until now. But I still have a doubt about this patch. > -----Original Message----- > From: libvir-list-bounces@xxxxxxxxxx > [mailto:libvir-list-bounces@xxxxxxxxxx] On Behalf Of Michal Privoznik > Sent: Friday, October 11, 2013 8:15 PM > To: libvir-list@xxxxxxxxxx > Subject: [PATCH v2] qemu_migration: Avoid crashing if domain dies > too quickly > > I've noticed a SIGSEGV-ing libvirtd on the destination when the qemu > died too quickly = in Prepare phase. What is happening here is: > > 1) [Thread 3493] We are in qemuMigrationPrepareAny() and calling > qemuProcessStart() which subsequently calls qemuProcessWaitForMonitor() > and qemuConnectMonitor(). So far so good. The qemuMonitorOpen() > succeeds, however switching monitor to QMP mode fails as qemu died > meanwhile. That is qemuMonitorSetCapabilities() returns -1. > > 2013-10-08 15:54:10.629+0000: 3493: debug : > qemuMonitorSetCapabilities:1356 : mon=0x14a53da0 > 2013-10-08 15:54:10.630+0000: 3493: debug : > qemuMonitorJSONCommandWithFd:262 : Send command > '{"execute":"qmp_capabilities","id":"libvirt-1"}' for write with FD -1 > 2013-10-08 15:54:10.630+0000: 3493: debug : > virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=17 > events=13 > ... > 2013-10-08 15:54:10.631+0000: 3493: debug : qemuMonitorSend:956 : > QEMU_MONITOR_SEND_MSG: mon=0x14a53da0 > msg={"execute":"qmp_capabilities","id":"libvirt-1"} > fd=-1 > 2013-10-08 15:54:10.631+0000: 3262: debug : virEventPollRunOnce:641 : > Poll got 1 event(s) > > 2) [Thread 3262] The event loop is trying to do the talking to monitor. > However, qemu is dead already, remember? > > 2013-10-08 15:54:13.436+0000: 3262: error : qemuMonitorIORead:551 : > Unable to read from monitor: Connection reset by peer > 2013-10-08 15:54:13.516+0000: 3262: debug : virFileClose:90 : Closed fd 25 > ... > 2013-10-08 15:54:13.533+0000: 3493: debug : qemuMonitorSend:968 : Send > command resulted in error internal error: early end of file from monitor: > possible problem: > > 3) [Thread 3493] qemuProcessStart() failed. No big deal. Go to the > 'endjob' label and subsequently to the 'cleanup'. Since the domain is > not persistent and ret is -1, the qemuDomainRemoveInactive() is called. > This has an (unpleasant) effect of virObjectUnref()-in the @vm object. > Unpleasant because the event loop which is about to trigger EOF callback > still holds a pointer to the @vm (not the reference). See the valgrind > output below. > > 4) [Thread 3262] So the even loop starts triggering EOF: > > 2013-10-08 15:54:13.542+0000: 3262: debug : qemuMonitorIO:729 : > Triggering EOF callback > 2013-10-08 15:54:13.543+0000: 3262: debug : > qemuProcessHandleMonitorEOF:294 : Received EOF on 0x14549110 'migt10' > > And the monitor is cleaned up. This results in calling > qemuProcessHandleMonitorEOF with the @vm pointer passed. The pointer > is > kept in qemuMonitor struct. > > ==3262== Thread 1: > ==3262== Invalid read of size 4 > ==3262== at 0x77ECCAA: pthread_mutex_lock (in > /lib64/libpthread-2.15.so) > ==3262== by 0x52FAA06: virMutexLock (virthreadpthread.c:85) > ==3262== by 0x52E3891: virObjectLock (virobject.c:320) > ==3262== by 0x11626743: qemuProcessHandleMonitorEOF > (qemu_process.c:296) > ==3262== by 0x11642593: qemuMonitorIO (qemu_monitor.c:730) > ==3262== by 0x52BD526: virEventPollDispatchHandles > (vireventpoll.c:501) > ==3262== by 0x52BDD49: virEventPollRunOnce (vireventpoll.c:648) > ==3262== by 0x52BBC68: virEventRunDefaultImpl (virevent.c:274) > ==3262== by 0x542D3D9: virNetServerRun (virnetserver.c:1112) > ==3262== by 0x11F368: main (libvirtd.c:1513) > ==3262== Address 0x14549128 is 24 bytes inside a block of size 136 free'd > ==3262== at 0x4C2AF5C: free (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > ==3262== by 0x529B1FF: virFree (viralloc.c:580) > ==3262== by 0x52E3703: virObjectUnref (virobject.c:270) > ==3262== by 0x531557E: virDomainObjListRemove > (domain_conf.c:2355) > ==3262== by 0x1160E899: qemuDomainRemoveInactive > (qemu_domain.c:2061) > ==3262== by 0x1163A0C6: qemuMigrationPrepareAny > (qemu_migration.c:2450) > ==3262== by 0x1163A923: qemuMigrationPrepareDirect > (qemu_migration.c:2626) > ==3262== by 0x11682D71: qemuDomainMigratePrepare3Params > (qemu_driver.c:10309) > ==3262== by 0x53B0976: virDomainMigratePrepare3Params > (libvirt.c:7266) > ==3262== by 0x1502D3: > remoteDispatchDomainMigratePrepare3Params (remote.c:4797) > ==3262== by 0x12DECA: > remoteDispatchDomainMigratePrepare3ParamsHelper > (remote_dispatch.h:5741) > ==3262== by 0x54322EB: virNetServerProgramDispatchCall > (virnetserverprogram.c:435) > > The mon->vm is set in qemuMonitorOpenInternal() which is the correct > place to increase @vm ref counter. The correct place to decrease the ref > counter is then qemuMonitorDispose(). > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > --- > > It turned out the hack from v1 is no longer needed. In fact, my reasoning > there > flavouring the hack was flawed. > > src/qemu/qemu_capabilities.c | 14 ++++++++++---- > src/qemu/qemu_monitor.c | 4 +++- > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c > index 7c39c1c..17095b4 100644 > --- a/src/qemu/qemu_capabilities.c > +++ b/src/qemu/qemu_capabilities.c > @@ -2587,7 +2587,8 @@ virQEMUCapsInitQMP(virQEMUCapsPtr > qemuCaps, > char *monpath = NULL; > char *pidfile = NULL; > pid_t pid = 0; > - virDomainObj vm; > + virDomainObjPtr vm = NULL; > + virDomainXMLOptionPtr xmlopt = NULL; > > /* the ".sock" sufix is important to avoid a possible clash with a qemu > * domain called "capabilities" > @@ -2650,10 +2651,13 @@ virQEMUCapsInitQMP(virQEMUCapsPtr > qemuCaps, > goto cleanup; > } > > - memset(&vm, 0, sizeof(vm)); > - vm.pid = pid; > + if (!(xmlopt = virDomainXMLOptionNew(NULL, NULL, NULL)) || > + !(vm = virDomainObjNew(xmlopt))) > + goto cleanup; > + > + vm->pid = pid; > > - if (!(mon = qemuMonitorOpen(&vm, &config, true, &callbacks, NULL))) > { > + if (!(mon = qemuMonitorOpen(vm, &config, true, &callbacks, NULL))) { > ret = 0; > goto cleanup; > } > @@ -2673,6 +2677,8 @@ cleanup: > virCommandFree(cmd); > VIR_FREE(monarg); > VIR_FREE(monpath); > + virObjectUnref(vm); Is this virObjectUnref(vm) corresponding to mon->vm = virObjectRef(vm) added in qemuMonitorOpenInternal? If it is, how can we confirm virObjectRef(vm) has been done in qemuMonitorOpenInternal? If an error (anyone follows)happened in qemuMonitorOpenInternal is before mon->vm = virObjectRef(vm), then we still goto cleanup and do virObjectUnref(vm), the refs will be wrong. Am I right? if (!cb->eofNotify) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("EOF notify callback must be supplied")); return NULL; } if (!cb->errorNotify) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Error notify callback must be supplied")); return NULL; } if (qemuMonitorInitialize() < 0) return NULL; if (!(mon = virObjectLockableNew(qemuMonitorClass))) return NULL; mon->fd = -1; mon->logfd = -1; if (virCondInit(&mon->notify) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("cannot initialize monitor condition")); goto cleanup; } mon->fd = fd; mon->hasSendFD = hasSendFD; mon->vm = virObjectRef(vm); > + virObjectUnref(xmlopt); > > if (pid != 0) { > char ebuf[1024]; > diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c > index a601ee0..2bafe28 100644 > --- a/src/qemu/qemu_monitor.c > +++ b/src/qemu/qemu_monitor.c > @@ -255,6 +255,8 @@ static void qemuMonitorDispose(void *obj) > VIR_DEBUG("mon=%p", mon); > if (mon->cb && mon->cb->destroy) > (mon->cb->destroy)(mon, mon->vm, mon->callbackOpaque); > + virObjectUnref(mon->vm); > + > virCondDestroy(&mon->notify); > VIR_FREE(mon->buffer); > virJSONValueFree(mon->options); > @@ -781,7 +783,7 @@ qemuMonitorOpenInternal(virDomainObjPtr vm, > } > mon->fd = fd; > mon->hasSendFD = hasSendFD; > - mon->vm = vm; > + mon->vm = virObjectRef(vm); > mon->json = json; > if (json) > mon->waitGreeting = true; > -- > 1.8.1.5 > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list