On Mon, Jul 19, 2021 at 09:37:01 +0200, Michal Prívozník wrote: > On 7/16/21 5:06 PM, Jiri Denemark wrote: > > Signaling the condition before vm->def->id is reset to -1 is dangerous: > > in case a waiting thread wakes up, it does not see anything interesting > > (the domain is still marked as running) and just enters virDomainObjWait > > where it waits forever because the condition will never be signalled > > again. > > > > Originally it was impossible to get into such situation because the vm > > object was locked all the time between signaling the condition and > > resetting vm->def->id, but after commit 860a999802 released in 6.8.0, > > qemuDomainObjStopWorker called in qemuProcessStop between > > virDomainObjBroadcast and setting vm->def->id to -1 unlocks the vm > > object giving other threads a chance to wake up and possibly hang. > > > > In real world, this can be easily reproduced by killing, destroying, or > > just shutting down (from the guest OS) a domain while it is being > > migrated somewhere else. The migration job would never finish. > > > > We can't fix this by reseting vm->def->id earlier because other > > functions (such as qemuProcessKill) do nothing when the domain is > > already marked as inactive. So let's make sure we delay signaling the > > Not true really. The VIR_QEMU_PROCESS_KILL_NOCHECK flag is passed to > qemuProcessKill which means the check for domain state (aka vm->def->id > == -1) is skipped. Heh, I completely missed this flag when looking at the code :-/ I dropped this part of the commit message. Jirka