On 06/25/2017 10:56 PM, Yi Wang wrote: > Libvirt forgets to remove inactive vm when failed to start a defined vm. > That may result in residual domain in driver->domains on such condition: > during the process of starting a vm, undefine it, and qemu exit because > of some exception. As we undefined the vm successfully, the vm->persistent > was set to 0, we will always fail to undefine it until restart libvirtd. This seems to be a strange sequence of operations, but the claim is that by adding this logic to CreateWithFlags, then the problem you're facing is resolved. However, is adding this to the Create logic the right thing to do? IIUC: CreateXMLWithFlags is successful Undefine domain is successful -> Logic at top says, "if !vm->persistent", then fail -> Logic at bottom has: -> vm->persistent = 0 -> if !virDomainObjIsActive then qemuDomainRemoveInactive ... Thus if domain is active, expectation is that RemoveInactive is done when the domain is "normally" shutdown... ... time goes by... QEMU exits w/ some exception -> qemuDomainDestroyFlags isn't run -> ? Thus avoiding some libvirt logic to handle the exception (would that be qemuProcessAutoDestroyAdd assuming VIR_DOMAIN_START_AUTODESTROY was used) ? So your fix is, the next time a CreateWithFlags happens, if anything fails, then domain removal can be done with the assumption that something within ObjStart will fail eventually because Undefine deleted most traces of the guest. Of course if libvirtd restart happens, then because the config no longer exists the domain object isn't created. I wonder if perhaps (yuck) another flag is required to indicate that a domain went from persistent to !persistent in Undefine, but because it was still running the RemoveInactive wasn't called. The interesting part of that is how would the reaping happen. Seems this would be part of some qemuProcessHandle* function. Something that would be resetting vm->def->id and/or running qemuProcessStop (e.g. processing via AutoDestroy) Some more details and understanding why/what qemuProcessHandle* may (or may not be) called I think is necessary. It doesn't seem to me that domain removal for this condition should require going through Start processing, but something in that monitor event processing logic would seem to need to be taught about this possibility. > > Signed-off-by: Yi Wang <wang.yi59@xxxxxxxxxx> > --- > src/qemu/qemu_driver.c | 2 ++ > 1 file changed, 2 insertions(+) > There's two reasons to get to the endjob: label... #1 virDomainObjIsActive (domain already active) #2 qemuDomainObjStart fails So this certainly wouldn't be right for the IsActive condition, under "normal" circumstances. And again, I don't think requiring a subsequent call to Start would be "correct" either. I am curious though if when in this condition, is the vm->def->id set? IOW: What does 'virsh list' indicate or even virsh dominfo $domain? John > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index cdb727b..af8afab 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -7185,6 +7185,8 @@ qemuDomainCreateWithFlags(virDomainPtr dom, unsigned int flags) > > endjob: > qemuProcessEndJob(driver, vm); > + if (ret < 0) > + qemuDomainRemoveInactive(driver, vm); > > cleanup: > virDomainObjEndAPI(&vm); > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list