> -----Original Message----- > From: Michal Privoznik [mailto:mprivozn@xxxxxxxxxx] > Sent: Monday, March 05, 2018 5:27 PM > To: Wuzongyong (Euler Dept) <cordius.wu@xxxxxxxxxx>; libvir- > list@xxxxxxxxxx > Cc: Wanzongshun (Vincent) <wanzongshun@xxxxxxxxxx>; weijinfen > <weijinfen@xxxxxxxxxx> > Subject: Re: [Question]Libvirt doesn't care about qemu > monitor event if fail to destroy qemu process > > On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: > > Hi, > > > > We unregister qemu monitor after sending > > QEMU_PROCESS_EVENT_MONITOR_EOF > to workerPool: > > > > static void > > qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, > > virDomainObjPtr vm, > > void *opaque) { > > virQEMUDriverPtr driver = opaque; > > qemuDomainObjPrivatePtr priv; > > struct qemuProcessEvent *processEvent; ... > > processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; > > processEvent->vm = vm; > > > > virObjectRef(vm); > > if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { > > ignore_value(virObjectUnref(vm)); > > VIR_FREE(processEvent); > > goto cleanup; > > } > > > > /* We don't want this EOF handler to be called over and over > > while > the > > * thread is waiting for a job. > > */ > > qemuMonitorUnregister(mon); > > ... > > } > > > > Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in > > processMonitorEOFEvent > function: > > > > static void > > processMonitorEOFEvent(virQEMUDriverPtr driver, > > virDomainObjPtr vm) { > > ... > > if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, > > true) < > 0) > > return; > > ... > > } > > > > Here, libvirt will show that the vm state is running all the time > > if qemuProcessBeginStopJob return -1 even though qemu may terminate > > or be > killed later. > > > > So, may be we should re-register the monitor when > qemuProcessBeginStopJob failed? > > The fact that processMonitorEOFEvent() failed to grab DESTROY job > means that we screwed up earlier and now you're just seeing effects of it. > Threads should be albe to acquire DESTROY job at any point, regardless > of other jobs set on the domain object. > > Can you please: > a) try to turn on debug logs [1] and tell us why acquiring DESTROY job > failed? You should see an error message like this: > > error: cannot acquire state change lock .. > > b) tell us what is your libvirt version and if you're able to > reproduce this with the latest git HEAD? > I said " qemuProcessBeginStopJob failed" means that: we failed to kill qemu process in 15 seconds (refer to virProcessKillPainfully). IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in 15s, and then libvirt will think qemu is still in running state event though qemu exit indeed after the 15s loop in virProcessKillPainfully because we have unregister the monitor. int qemuProcessBeginStopJob(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainJob job, bool forceKill) { ... if (qemuProcessKill(vm, killFlags) < 0) goto cleanup; ... } > > Ha! Looking at the code I think I've found something that might be > causing this issue. Do you have max_queued set in qemu.conf? Because > if you do, then qemuDomainObjBeginJobInternal() might fail to set job > because it's above the set limit. If I'm right, this should be the fix: > > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index > 8b4efc82d..7eb631e06 100644 > --- a/src/qemu/qemu_domain.c > +++ b/src/qemu/qemu_domain.c > @@ -5401,7 +5401,8 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr > driver, > then = now + QEMU_JOB_WAIT_TIME; > > retry: > - if (cfg->maxQueuedJobs && > + if ((!async && job == QEMU_JOB_DESTROY) && > + cfg->maxQueuedJobs && > priv->jobs_queued > cfg->maxQueuedJobs) { > goto error; > } > > > Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list