> -----Original Message----- > From: Michal Privoznik [mailto:mprivozn@xxxxxxxxxx] > Sent: Monday, March 5, 2018 8:09 PM > To: Cordius Wu; 'Wuzongyong (Euler Dept)'; libvir-list@xxxxxxxxxx > Cc: 'Wanzongshun (Vincent)'; 'weijinfen' > Subject: Re: [Question]Libvirt doesn't care about qemu monitor > event if fail to destroy qemu process > > On 03/05/2018 12:43 PM, Cordius Wu wrote: > >>>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: > >>>>> Hi, > >>>>> > >>>>> We unregister qemu monitor after sending > >>>>> QEMU_PROCESS_EVENT_MONITOR_EOF > >>>> to workerPool: > >>>>> > >>>>> static void > >>>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, > >>>>> virDomainObjPtr vm, > >>>>> void *opaque) { > >>>>> virQEMUDriverPtr driver = opaque; > >>>>> qemuDomainObjPrivatePtr priv; > >>>>> struct qemuProcessEvent *processEvent; ... > >>>>> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; > >>>>> processEvent->vm = vm; > >>>>> > >>>>> virObjectRef(vm); > >>>>> if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) > >>>>> < 0) > > { > >>>>> ignore_value(virObjectUnref(vm)); > >>>>> VIR_FREE(processEvent); > >>>>> goto cleanup; > >>>>> } > >>>>> > >>>>> /* We don't want this EOF handler to be called over and over > >>>>> while > >>>> the > >>>>> * thread is waiting for a job. > >>>>> */ > >>>>> qemuMonitorUnregister(mon); > >>>>> ... > >>>>> } > >>>>> > >>>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in > >>>>> processMonitorEOFEvent > >>>> function: > >>>>> > >>>>> static void > >>>>> processMonitorEOFEvent(virQEMUDriverPtr driver, > >>>>> virDomainObjPtr vm) { > >>>>> ... > >>>>> if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, > >>>>> true) < > >>>> 0) > >>>>> return; > >>>>> ... > >>>>> } > >>>>> > >>>>> Here, libvirt will show that the vm state is running all the time > >>>>> if qemuProcessBeginStopJob return -1 even though qemu may > >>>>> terminate or be > >>>> killed later. > >>>>> > >>>>> So, may be we should re-register the monitor when > >>>> qemuProcessBeginStopJob failed? > >>>> > >>>> The fact that processMonitorEOFEvent() failed to grab DESTROY job > >>>> means that we screwed up earlier and now you're just seeing effects > of it. > >>>> Threads should be albe to acquire DESTROY job at any point, > >>>> regardless > > of > >>>> other jobs set on the domain object. > >>>> > >>>> Can you please: > >>>> a) try to turn on debug logs [1] and tell us why acquiring DESTROY > >>>> job failed? You should see an error message like this: > >>>> > >>>> error: cannot acquire state change lock .. > >>>> > >>>> b) tell us what is your libvirt version and if you're able to > >>>> reproduce this with the latest git HEAD? > >>>> > >>> > >>> I said " qemuProcessBeginStopJob failed" means that: > >> > >> Oh, I though that the message you've sent earlier is related to this: > >> > >> https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html > >> > >> So you are not accidentally sending SIGKILL to qemu then? > > > > Yep, I send SIGKILL to qemu outside. The 'accident' means that the > > scene libvirt indicate the vm is in running state all the time is > > hardly to reproduce. In the past month, I just reproduce it twice. > > > > > > > >>> we failed to kill qemu process in 15 seconds (refer to > > virProcessKillPainfully). > >>> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit > >>> in > > 15s, and > >>> then libvirt will think qemu is still in running state event though > >>> qemu > > exit > >>> indeed after the 15s loop in virProcessKillPainfully. > >> > >> What state is qemu process in then? I mean, how can we see EOF if the > >> process still exists? > >> > > I send SIGKILL to qemu process, but the qemu process didn't exited > > immediately, I use command 'ps -ef | grep qemu' show that the qemu > > process is in defunct state. > > Ah, so you can find the process, but it is in D state. Because I read the > email linked above like qemu is gone. Yep > > Then about > > 20s-30s after sending the SIGKILLthe qemu process exited and I can't > > find the qemu info though ps command. > > So, the libvirt still think the qemu process is alive in the 15s loop > > in virProcessKillPainfully. > > Ah, so IIUC, qemu has closed the monitor but right after that it went to > the D state instead of quitting. Meanwhile, libvirt sees EOF on the monitor > but is unable to kill the process. Right > Well, registering EOF handler back would be only a workaround, because if > you register EOF handler back the event loop will do a busy wait (in each > iteration it will see EOF), so eventually the > virProcessKillPainfully() will see the process gone and > qemuProcessBeginStopJob() would be able to return successfully. > > I'm unsure what the right fix might be though. Maybe, at EOF we can check > what state is qemu process in and if it's in D state don't try to kill it > and continue with BeginJob() call. > > Michal Hmm, I can't come up with a better solution for this problem, so I wish if somebody could help to solve this problem. BTW, how to check a process is in D state in libvirt? Thanks, Wu Zongyong -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list