Re: [BUG] qemu quited unexpectedly will cause libvirtd crashed

Eric Blake <eblake@xxxxxxxxxx> · Thu, 31 Mar 2011 12:54:02 -0600

On 03/31/2011 12:42 AM, Wen Congyang wrote:
>>> So I try to use gdb and add sleep() to trigger this bug. I have posted two patches
>>> to fix 2 bugs. But there is still another bug, and I have no good way to fix it.
>>
>>> Steps to reproduce this bug:
>>> 1. use gdb to attach libvirtd, and set a breakpoint in the function
>>>    qemuConnectMonitor()
>>> 2. start a vm
>>> 3. let the libvirtd to run until qemuMonitorSetCapabilities() returns.
>>> 4. kill the qemu process
>>> 5. step into qemuDomainObjExitMonitorWithDriver(), and set debug to 1
>>>
>>> Now, qemuDomainObjExitMonitorWithDriver() will sleep 100s to make sure
>>> qemuProcessHandleMonitorEOF() is done before qemuProcessHandleMonitorEOF()
>>> returns.
>>>
>>> priv->mon will be null after qemuDomainObjExitMonitorWithDriver() returns.
>>> So we must not use it. Unfortunately we still use it, and it will cause
>>> libvirtd crashed.
>>
>> Sounds like qemuConnectMonitor needs an extra reference around priv->mon
>> for the duration of the connect attempt, so that
>> qemuProcessHandleMonitorEOF will not free the monitor.
> 
> No, qemuConnectMonitor() calls qemuDomainObjEnterMonitorWithDriver() to hold
> an extra reference around priv->mon, and release it in qemuDomainObjExitMonitorWithDriver().
> But qemuDomainObjExitMonitorWithDriver() does not tell the caller whether
> priv->mon can be used.
> 
> If we will call qemuDomainObjEnterMonitorWithDriver()/qemuDomainObjEnterMonitor(),
> I think we must check whether the domain is active. If we call them more than once,
> we must check it every time. And we should not do other things between checking whether
> the domain is active and calling them(If we do as this, the code can be maintained easily)

I think I hit on the same problem earlier, of a guest modifying vm state
on assumption that a successful monitor command even if the vm went down
in the window after the monitor command completed but before lock was
regained:

https://www.redhat.com/archives/libvir-list/2011-March/msg00636.html

> 
> But some codes does not respect this rule.
> 
> Here is the list of the functions that may have the similar problem:
> 1. qemu_migration.c
>    doNativeMigrate()
>    qemuMigrationToFile()

My current thoughts are that maybe qemuDomainObjExitMonitor should be
made to return the value of vm active after regaining lock, and marked
ATTRIBUTE_RETURN_CHECK, to force all other callers to detect the case of
a monitor command completing successfully but then the VM disappearing
in the window between command completion and regaining the vm lock.

-- 
Eric Blake   eblake@xxxxxxxxxx    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment:
signature.asc

Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list