>>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: >>>> Hi, >>>> >>>> We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF >>> to workerPool: >>>> >>>> static void >>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, >>>> virDomainObjPtr vm, >>>> void *opaque) { >>>> virQEMUDriverPtr driver = opaque; >>>> qemuDomainObjPrivatePtr priv; >>>> struct qemuProcessEvent *processEvent; ... >>>> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; >>>> processEvent->vm = vm; >>>> >>>> virObjectRef(vm); >>>> if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { >>>> ignore_value(virObjectUnref(vm)); >>>> VIR_FREE(processEvent); >>>> goto cleanup; >>>> } >>>> >>>> /* We don't want this EOF handler to be called over and over while >>> the >>>> * thread is waiting for a job. >>>> */ >>>> qemuMonitorUnregister(mon); >>>> ... >>>> } >>>> >>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent >>> function: >>>> >>>> static void >>>> processMonitorEOFEvent(virQEMUDriverPtr driver, >>>> virDomainObjPtr vm) { >>>> ... >>>> if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) < >>> 0) >>>> return; >>>> ... >>>> } >>>> >>>> Here, libvirt will show that the vm state is running all the time if >>>> qemuProcessBeginStopJob return -1 even though qemu may terminate or be >>> killed later. >>>> >>>> So, may be we should re-register the monitor when >>> qemuProcessBeginStopJob failed? >>> >>> The fact that processMonitorEOFEvent() failed to grab DESTROY job means >>> that we screwed up earlier and now you're just seeing effects of it. >>> Threads should be albe to acquire DESTROY job at any point, regardless of >>> other jobs set on the domain object. >>> >>> Can you please: >>> a) try to turn on debug logs [1] and tell us why acquiring DESTROY job >>> failed? You should see an error message like this: >>> >>> error: cannot acquire state change lock .. >>> >>> b) tell us what is your libvirt version and if you're able to reproduce >>> this with the latest git HEAD? >>> >> >> I said " qemuProcessBeginStopJob failed" means that: > >Oh, I though that the message you've sent earlier is related to this: > >https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html > >So you are not accidentally sending SIGKILL to qemu then? Yep, I send SIGKILL to qemu outside. The 'accident' means that the scene libvirt indicate the vm is in running state all the time is hardly to reproduce. In the past month, I just reproduce it twice. >> we failed to kill qemu process in 15 seconds (refer to virProcessKillPainfully). >> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in 15s, and >> then libvirt will think qemu is still in running state event though qemu exit >> indeed after the 15s loop in virProcessKillPainfully. > >What state is qemu process in then? I mean, how can we see EOF if the >process still exists? > I send SIGKILL to qemu process, but the qemu process didn't exited immediately, I use command 'ps -ef | grep qemu' show that the qemu process is in defunct state. Then about 20s-30s after sending the SIGKILLthe qemu process exited and I can't find the qemu info though ps command. So, the libvirt still think the qemu process is alive in the 15s loop in virProcessKillPainfully. Thanks, Wu Zongyong -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list