Re: [Question]Libvirt doesn't care about qemu monitor event if fail to destroy qemu process

"Cordius Wu" <wuzongyo@xxxxxxxxxxxxxxxx> · Mon, 5 Mar 2018 19:43:21 +0800

>>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote:
>>>> Hi,
>>>>
>>>> We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF
>>> to workerPool:
>>>>
>>>> static void
>>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
>>>>                             virDomainObjPtr vm,
>>>>                             void *opaque) {
>>>>     virQEMUDriverPtr driver = opaque;
>>>>     qemuDomainObjPrivatePtr priv;
>>>> struct qemuProcessEvent *processEvent; ...
>>>> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF;
>>>>     processEvent->vm = vm;
>>>>
>>>>     virObjectRef(vm);
>>>>     if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0)
{
>>>>         ignore_value(virObjectUnref(vm));
>>>>         VIR_FREE(processEvent);
>>>>         goto cleanup;
>>>>     }
>>>>
>>>>     /* We don't want this EOF handler to be called over and over while
>>> the
>>>>      * thread is waiting for a job.
>>>>      */
>>>> qemuMonitorUnregister(mon);
>>>> ...
>>>> }
>>>>
>>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent
>>> function:
>>>>
>>>> static void
>>>> processMonitorEOFEvent(virQEMUDriverPtr driver,
>>>>                        virDomainObjPtr vm) {
>>>>       ...
>>>>       if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) <
>>> 0)
>>>>         return;
>>>>       ...
>>>> }
>>>>
>>>> Here,  libvirt will show that the vm state is running all the time if
>>>> qemuProcessBeginStopJob return -1 even though qemu may terminate or be
>>> killed later.
>>>>
>>>> So, may be we should re-register the monitor when
>>> qemuProcessBeginStopJob failed?
>>>
>>> The fact that processMonitorEOFEvent() failed to grab DESTROY job means
>>> that we screwed up earlier and now you're just seeing effects of it.
>>> Threads should be albe to acquire DESTROY job at any point, regardless
of
>>> other jobs set on the domain object.
>>>
>>> Can you please:
>>> a) try to turn on debug logs [1] and tell us why acquiring DESTROY job
>>> failed? You should see an error message like this:
>>>
>>>   error: cannot acquire state change lock ..
>>>
>>> b) tell us what is your libvirt version and if you're able to reproduce
>>> this with the latest git HEAD?
>>>
>> 
>> I said " qemuProcessBeginStopJob failed" means that：
>
>Oh, I though that the message you've sent earlier is related to this:
>
>https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html
>
>So you are not accidentally sending SIGKILL to qemu then?

Yep, I send SIGKILL to qemu outside. The 'accident' means that the scene
libvirt indicate
the vm is in running state all the time is hardly to reproduce. In the past
month, I just
reproduce it twice.

>> we failed to kill qemu process in 15 seconds (refer to
virProcessKillPainfully).
>> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in
15s， and
>> then libvirt will think qemu is still in running state event though qemu
exit
>> indeed after the 15s loop in virProcessKillPainfully.
>
>What state is qemu process in then? I mean, how can we see EOF if the
>process still exists?
>
I send SIGKILL to qemu process, but the qemu process didn't exited
immediately, I use
command 'ps -ef | grep qemu' show that the qemu process is in defunct state.
Then about
20s-30s after sending the SIGKILLthe qemu process exited and I can't find
the qemu info
though ps command.
So, the libvirt still think the qemu process is alive in the 15s loop in
virProcessKillPainfully.

Thanks,
Wu Zongyong

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list