On 03/05/2018 01:21 PM, Cordius Wu wrote: > >> -----Original Message----- >> From: Michal Privoznik [mailto:mprivozn@xxxxxxxxxx] >> Sent: Monday, March 5, 2018 8:09 PM >> To: Cordius Wu; 'Wuzongyong (Euler Dept)'; libvir-list@xxxxxxxxxx >> Cc: 'Wanzongshun (Vincent)'; 'weijinfen' >> Subject: Re: [Question]Libvirt doesn't care about qemu monitor >> event if fail to destroy qemu process >> >> On 03/05/2018 12:43 PM, Cordius Wu wrote: >>>>>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: >>>>>>> Hi, >>>>>>> >>>>>>> We unregister qemu monitor after sending >>>>>>> QEMU_PROCESS_EVENT_MONITOR_EOF >>>>>> to workerPool: >>>>>>> >>>>>>> static void >>>>>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, >>>>>>> virDomainObjPtr vm, >>>>>>> void *opaque) { >>>>>>> virQEMUDriverPtr driver = opaque; >>>>>>> qemuDomainObjPrivatePtr priv; >>>>>>> struct qemuProcessEvent *processEvent; ... >>>>>>> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; >>>>>>> processEvent->vm = vm; >>>>>>> >>>>>>> virObjectRef(vm); >>>>>>> if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) >>>>>>> < 0) >>> { >>>>>>> ignore_value(virObjectUnref(vm)); >>>>>>> VIR_FREE(processEvent); >>>>>>> goto cleanup; >>>>>>> } >>>>>>> >>>>>>> /* We don't want this EOF handler to be called over and over >>>>>>> while >>>>>> the >>>>>>> * thread is waiting for a job. >>>>>>> */ >>>>>>> qemuMonitorUnregister(mon); >>>>>>> ... >>>>>>> } >>>>>>> >>>>>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in >>>>>>> processMonitorEOFEvent >>>>>> function: >>>>>>> >>>>>>> static void >>>>>>> processMonitorEOFEvent(virQEMUDriverPtr driver, >>>>>>> virDomainObjPtr vm) { >>>>>>> ... >>>>>>> if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, >>>>>>> true) < >>>>>> 0) >>>>>>> return; >>>>>>> ... >>>>>>> } >>>>>>> >>>>>>> Here, libvirt will show that the vm state is running all the time >>>>>>> if qemuProcessBeginStopJob return -1 even though qemu may >>>>>>> terminate or be >>>>>> killed later. >>>>>>> >>>>>>> So, may be we should re-register the monitor when >>>>>> qemuProcessBeginStopJob failed? >>>>>> >>>>>> The fact that processMonitorEOFEvent() failed to grab DESTROY job >>>>>> means that we screwed up earlier and now you're just seeing effects >> of it. >>>>>> Threads should be albe to acquire DESTROY job at any point, >>>>>> regardless >>> of >>>>>> other jobs set on the domain object. >>>>>> >>>>>> Can you please: >>>>>> a) try to turn on debug logs [1] and tell us why acquiring DESTROY >>>>>> job failed? You should see an error message like this: >>>>>> >>>>>> error: cannot acquire state change lock .. >>>>>> >>>>>> b) tell us what is your libvirt version and if you're able to >>>>>> reproduce this with the latest git HEAD? >>>>>> >>>>> >>>>> I said " qemuProcessBeginStopJob failed" means that: >>>> >>>> Oh, I though that the message you've sent earlier is related to this: >>>> >>>> https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html >>>> >>>> So you are not accidentally sending SIGKILL to qemu then? >>> >>> Yep, I send SIGKILL to qemu outside. The 'accident' means that the >>> scene libvirt indicate the vm is in running state all the time is >>> hardly to reproduce. In the past month, I just reproduce it twice. >>> >>> >>> >>>>> we failed to kill qemu process in 15 seconds (refer to >>> virProcessKillPainfully). >>>>> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit >>>>> in >>> 15s, and >>>>> then libvirt will think qemu is still in running state event though >>>>> qemu >>> exit >>>>> indeed after the 15s loop in virProcessKillPainfully. >>>> >>>> What state is qemu process in then? I mean, how can we see EOF if the >>>> process still exists? >>>> >>> I send SIGKILL to qemu process, but the qemu process didn't exited >>> immediately, I use command 'ps -ef | grep qemu' show that the qemu >>> process is in defunct state. >> >> Ah, so you can find the process, but it is in D state. Because I read > the >> email linked above like qemu is gone. > > Yep >>> Then about >>> 20s-30s after sending the SIGKILLthe qemu process exited and I can't >>> find the qemu info though ps command. >>> So, the libvirt still think the qemu process is alive in the 15s loop >>> in virProcessKillPainfully. >> >> Ah, so IIUC, qemu has closed the monitor but right after that it went to >> the D state instead of quitting. Meanwhile, libvirt sees EOF on the > monitor >> but is unable to kill the process. > > Right >> Well, registering EOF handler back would be only a workaround, because > if >> you register EOF handler back the event loop will do a busy wait (in > each >> iteration it will see EOF), so eventually the >> virProcessKillPainfully() will see the process gone and >> qemuProcessBeginStopJob() would be able to return successfully. >> >> I'm unsure what the right fix might be though. Maybe, at EOF we can > check >> what state is qemu process in and if it's in D state don't try to kill > it >> and continue with BeginJob() call. >> >> Michal > Hmm, I can't come up with a better solution for this problem, so I wish if > somebody could help to solve this problem. > BTW, how to check a process is in D state in libvirt? By reading /proc/$pid/status. Although this would work only on Linux, not *BSD. On the other hand, I'm not sure *BSD has D state. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list