Thanks, Zongyong Wu > >> On 03/05/2018 12:43 PM, Cordius Wu wrote: > >>>>>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> We unregister qemu monitor after sending > >>>>>>> QEMU_PROCESS_EVENT_MONITOR_EOF > >>>>>> to workerPool: > >>>>>>> > >>>>>>> static void > >>>>>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, > >>>>>>> virDomainObjPtr vm, > >>>>>>> void *opaque) { > >>>>>>> virQEMUDriverPtr driver = opaque; > >>>>>>> qemuDomainObjPrivatePtr priv; struct qemuProcessEvent > >>>>>>> *processEvent; ... > >>>>>>> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; > >>>>>>> processEvent->vm = vm; > >>>>>>> > >>>>>>> virObjectRef(vm); > >>>>>>> if (virThreadPoolSendJob(driver->workerPool, 0, > >>>>>>> processEvent) < 0) > >>> { > >>>>>>> ignore_value(virObjectUnref(vm)); > >>>>>>> VIR_FREE(processEvent); > >>>>>>> goto cleanup; > >>>>>>> } > >>>>>>> > >>>>>>> /* We don't want this EOF handler to be called over and over > >>>>>>> while > >>>>>> the > >>>>>>> * thread is waiting for a job. > >>>>>>> */ > >>>>>>> qemuMonitorUnregister(mon); > >>>>>>> ... > >>>>>>> } > >>>>>>> > >>>>>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in > >>>>>>> processMonitorEOFEvent > >>>>>> function: > >>>>>>> > >>>>>>> static void > >>>>>>> processMonitorEOFEvent(virQEMUDriverPtr driver, > >>>>>>> virDomainObjPtr vm) { > >>>>>>> ... > >>>>>>> if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, > >>>>>>> true) < > >>>>>> 0) > >>>>>>> return; > >>>>>>> ... > >>>>>>> } > >>>>>>> > >>>>>>> Here, libvirt will show that the vm state is running all the > >>>>>>> time if qemuProcessBeginStopJob return -1 even though qemu may > >>>>>>> terminate or be > >>>>>> killed later. > >>>>>>> > >>>>>>> So, may be we should re-register the monitor when > >>>>>> qemuProcessBeginStopJob failed? > >>>>>> > >>>>>> The fact that processMonitorEOFEvent() failed to grab DESTROY job > >>>>>> means that we screwed up earlier and now you're just seeing > >>>>>> effects > >> of it. > >>>>>> Threads should be albe to acquire DESTROY job at any point, > >>>>>> regardless > >>> of > >>>>>> other jobs set on the domain object. > >>>>>> > >>>>>> Can you please: > >>>>>> a) try to turn on debug logs [1] and tell us why acquiring > >>>>>> DESTROY job failed? You should see an error message like this: > >>>>>> > >>>>>> error: cannot acquire state change lock .. > >>>>>> > >>>>>> b) tell us what is your libvirt version and if you're able to > >>>>>> reproduce this with the latest git HEAD? > >>>>>> > >>>>> > >>>>> I said " qemuProcessBeginStopJob failed" means that: > >>>> > >>>> Oh, I though that the message you've sent earlier is related to this: > >>>> > >>>> https://www.redhat.com/archives/libvir-list/2018-March/msg00148.htm > >>>> l > >>>> > >>>> So you are not accidentally sending SIGKILL to qemu then? > >>> > >>> Yep, I send SIGKILL to qemu outside. The 'accident' means that the > >>> scene libvirt indicate the vm is in running state all the time is > >>> hardly to reproduce. In the past month, I just reproduce it twice. > >>> > >>> > >>> > >>>>> we failed to kill qemu process in 15 seconds (refer to > >>> virProcessKillPainfully). > >>>>> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit > >>>>> in > >>> 15s, and > >>>>> then libvirt will think qemu is still in running state event > >>>>> though qemu > >>> exit > >>>>> indeed after the 15s loop in virProcessKillPainfully. > >>>> > >>>> What state is qemu process in then? I mean, how can we see EOF if > >>>> the process still exists? > >>>> > >>> I send SIGKILL to qemu process, but the qemu process didn't exited > >>> immediately, I use command 'ps -ef | grep qemu' show that the qemu > >>> process is in defunct state. > >> > >> Ah, so you can find the process, but it is in D state. Because I read > > the > >> email linked above like qemu is gone. > > > > Yep > >>> Then about > >>> 20s-30s after sending the SIGKILLthe qemu process exited and I can't > >>> find the qemu info though ps command. > >>> So, the libvirt still think the qemu process is alive in the 15s > >>> loop in virProcessKillPainfully. > >> > >> Ah, so IIUC, qemu has closed the monitor but right after that it went > >> to the D state instead of quitting. Meanwhile, libvirt sees EOF on > >> the > > monitor > >> but is unable to kill the process. > > > > Right > >> Well, registering EOF handler back would be only a workaround, > >> because > > if > >> you register EOF handler back the event loop will do a busy wait (in > > each > >> iteration it will see EOF), so eventually the > >> virProcessKillPainfully() will see the process gone and > >> qemuProcessBeginStopJob() would be able to return successfully. > >> > >> I'm unsure what the right fix might be though. Maybe, at EOF we can > > check > >> what state is qemu process in and if it's in D state don't try to > >> kill > > it > >> and continue with BeginJob() call. > >> > >> Michal > > Hmm, I can't come up with a better solution for this problem, so I > > wish if somebody could help to solve this problem. > > BTW, how to check a process is in D state in libvirt? > > By reading /proc/$pid/status. Although this would work only on Linux, not > *BSD. On the other hand, I'm not sure *BSD has D state. > > Michal Hmmm, is a process marked with defunct in Z state instead of D state? -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list