On 03/31/2011 12:42 AM, Wen Congyang wrote: >>> So I try to use gdb and add sleep() to trigger this bug. I have posted two patches >>> to fix 2 bugs. But there is still another bug, and I have no good way to fix it. >> >>> Steps to reproduce this bug: >>> 1. use gdb to attach libvirtd, and set a breakpoint in the function >>> qemuConnectMonitor() >>> 2. start a vm >>> 3. let the libvirtd to run until qemuMonitorSetCapabilities() returns. >>> 4. kill the qemu process >>> 5. step into qemuDomainObjExitMonitorWithDriver(), and set debug to 1 >>> >>> Now, qemuDomainObjExitMonitorWithDriver() will sleep 100s to make sure >>> qemuProcessHandleMonitorEOF() is done before qemuProcessHandleMonitorEOF() >>> returns. >>> >>> priv->mon will be null after qemuDomainObjExitMonitorWithDriver() returns. >>> So we must not use it. Unfortunately we still use it, and it will cause >>> libvirtd crashed. >> >> Sounds like qemuConnectMonitor needs an extra reference around priv->mon >> for the duration of the connect attempt, so that >> qemuProcessHandleMonitorEOF will not free the monitor. > > No, qemuConnectMonitor() calls qemuDomainObjEnterMonitorWithDriver() to hold > an extra reference around priv->mon, and release it in qemuDomainObjExitMonitorWithDriver(). > But qemuDomainObjExitMonitorWithDriver() does not tell the caller whether > priv->mon can be used. > > If we will call qemuDomainObjEnterMonitorWithDriver()/qemuDomainObjEnterMonitor(), > I think we must check whether the domain is active. If we call them more than once, > we must check it every time. And we should not do other things between checking whether > the domain is active and calling them(If we do as this, the code can be maintained easily) I think I hit on the same problem earlier, of a guest modifying vm state on assumption that a successful monitor command even if the vm went down in the window after the monitor command completed but before lock was regained: https://www.redhat.com/archives/libvir-list/2011-March/msg00636.html > > But some codes does not respect this rule. > > Here is the list of the functions that may have the similar problem: > 1. qemu_migration.c > doNativeMigrate() > qemuMigrationToFile() My current thoughts are that maybe qemuDomainObjExitMonitor should be made to return the value of vm active after regaining lock, and marked ATTRIBUTE_RETURN_CHECK, to force all other callers to detect the case of a monitor command completing successfully but then the VM disappearing in the window between command completion and regaining the vm lock. -- Eric Blake eblake@xxxxxxxxxx +1-801-349-2682 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list