On 03/05/2018 10:26 AM, Michal Privoznik wrote: > On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: >> Hi, >> >> We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool: >> >> static void >> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, >> virDomainObjPtr vm, >> void *opaque) >> { >> virQEMUDriverPtr driver = opaque; >> qemuDomainObjPrivatePtr priv; >> struct qemuProcessEvent *processEvent; >> ... >> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; >> processEvent->vm = vm; >> >> virObjectRef(vm); >> if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { >> ignore_value(virObjectUnref(vm)); >> VIR_FREE(processEvent); >> goto cleanup; >> } >> >> /* We don't want this EOF handler to be called over and over while the >> * thread is waiting for a job. >> */ >> qemuMonitorUnregister(mon); >> ... >> } >> >> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent function: >> >> static void >> processMonitorEOFEvent(virQEMUDriverPtr driver, >> virDomainObjPtr vm) >> { >> ... >> if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) < 0) >> return; >> ... >> } >> >> Here, libvirt will show that the vm state is running all the time if qemuProcessBeginStopJob return -1 >> even though qemu may terminate or be killed later. >> >> So, may be we should re-register the monitor when qemuProcessBeginStopJob failed? > > The fact that processMonitorEOFEvent() failed to grab DESTROY job means > that we screwed up earlier and now you're just seeing effects of it. > Threads should be albe to acquire DESTROY job at any point, regardless > of other jobs set on the domain object. > > Can you please: > a) try to turn on debug logs [1] and tell us why acquiring DESTROY job > failed? You should see an error message like this: > > error: cannot acquire state change lock .. > > b) tell us what is your libvirt version and if you're able to reproduce > this with the latest git HEAD? > > > Ha! Looking at the code I think I've found something that might be > causing this issue. Do you have max_queued set in qemu.conf? Because if > you do, then qemuDomainObjBeginJobInternal() might fail to set job > because it's above the set limit. If I'm right, this should be the fix: > > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c > index 8b4efc82d..7eb631e06 100644 > --- a/src/qemu/qemu_domain.c > +++ b/src/qemu/qemu_domain.c > @@ -5401,7 +5401,8 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, > then = now + QEMU_JOB_WAIT_TIME; > > retry: > - if (cfg->maxQueuedJobs && > + if ((!async && job == QEMU_JOB_DESTROY) && Oh, this should be reversed (note to myself: don't write any patches until morning coffee kicks in): if ((!async && job != QEMU_JOB_DESTROY) && cfg->maxQueuedJobs && Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list