When I edit the domain's config file like this: ===================== <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/test3.img'/> <target dev='sdb' bus='scsi'/> <address type='drive' controller='0' bus='0' unit='10'/> </disk> ===================== Note, the unit is wrong, but libvirt does not check it. When I start the vm with the wrong config file, libvirtd will be blocked because qemu quited unexpectedly. This bug does not happen every time, and it only happened once on my box. So I try to use gdb and add sleep() to trigger this bug. I have posted two patches to fix 2 bugs. But there is still another bug, and I have no good way to fix it. I add sleep() in qemuDomainObjExitMonitorWithDriver(): ============================== int qemuDomainObjExitMonitorWithDriver(struct qemud_driver *driver, virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; int refs; int debug = 0; refs = qemuMonitorUnref(priv->mon); if (refs > 0) qemuMonitorUnlock(priv->mon); /* Note: qemu may quited unexpectedly here, and the monitor will be freed. * If it happened, priv->mon will be null. */ if (debug) sleep(100); qemuDriverLock(driver); virDomainObjLock(obj); if (refs == 0) { priv->mon = NULL; } } ============================== Steps to reproduce this bug: 1. use gdb to attach libvirtd, and set a breakpoint in the function qemuConnectMonitor() 2. start a vm 3. let the libvirtd to run until qemuMonitorSetCapabilities() returns. 4. kill the qemu process 5. step into qemuDomainObjExitMonitorWithDriver(), and set debug to 1 Now, qemuDomainObjExitMonitorWithDriver() will sleep 100s to make sure qemuProcessHandleMonitorEOF() is done before qemuProcessHandleMonitorEOF() returns. priv->mon will be null after qemuDomainObjExitMonitorWithDriver() returns. So we must not use it. Unfortunately we still use it, and it will cause libvirtd crashed. My first fix is that qemuDomainObjExitMonitorWithDriver() returns -1, and the caller checks the return value, then do some cleanup and return error. Unfortunately we may use priv->mon when doing some cleanup. The only way to avoid it is that add some local variable and set it when qemu quited unexpectedly. Avoid to use priv->mon in cleanup codes Is there some simply way to fix this bug. -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list