On 2015/10/28 5:45, John Ferlan wrote: > > > On 10/02/2015 08:17 AM, John Ferlan wrote: >> >> >> On 09/26/2015 08:18 AM, Wang Yufei wrote: >>> We shutdown a VM A by qemu agent,meanwhile an agent EOF >>> of VM A happened, there's a chance that deadlock occurred: >>> >>> qemuProcessHandleAgentEOF in main thread >>> A) priv->agent = NULL; //A happened before B >>> >>> //deadlock when we get agent lock which's held by worker thread >>> qemuAgentClose(agent); >>> >>> qemuDomainObjExitAgent called by qemuDomainShutdownFlags in worker thread >>> B) hasRefs = virObjectUnref(priv->agent); //priv->agent is NULL, return false >>> >>> if (hasRefs) >>> virObjectUnlock(priv->agent); //agent lock will not be released here >>> >>> So I close agent first, then set priv->agent NULL to fix the deadlock. >>> >>> Signed-off-by: Wang Yufei <james.wangyufei@xxxxxxxxxx> >>> Reviewed-by: Ren Guannan <renguannan@xxxxxxxxxx> >>> --- >>> src/qemu/qemu_process.c | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >> >> Interesting - this is the exact opposite of commit id '1020a504' from >> Michal over 3 years ago. >> >> However, a bit of digging into the claim from the commit message drove >> me to commit id '362d04779c' which removes the domain object lock that >> was the basis of the initial patch. >> >> While I'm not an expert in the vmagent code, I do note that the >> HandleAgentEOF code checks for !priv->agent and priv->beingDestroyed, >> while the ExitAgent code doesn't necessarily (or directly) check whether >> the priv->agent is still valid (IOW: that nothing else has removed it >> already like the EOF). >> >> So, while I don't discount that the patch works - I'm wondering whether >> the smarts/logic should be built into ExitAgent to check for >> !priv->agent (and or something else) that would indicate we're already >> on the path of shutdown. >> >> John >> >> > > Ironically after spinning a few cycles and generating another patch, it > seems my initial instincts were correct with respect to commit > 362d04779c removing the reason/cause for 1020a504 and thus moving the > qemuAgentClose is the correct option. > > Of course I'm not sure what caused me to doubt my first thoughts and > start down the path of trying different ways to resolve this. I think I > somehow got it stuck in my head that AgentDestroy still was taking out a > lock. Oh well, I did learn something at least with respect to how this > code works - so that part is good. I can also answer my own question > with respect whether ExitAgent needs to check priv->agent and/or > priv->beingDestroyed. > > With this patch, the EOF code will take out the vm-lock, then attempt to > take out the agent-lock, but will be 'stuck' waiting for the AgentExit > code to run. Once ExitAgent is called, it will remove it's reference and > unlock. I don't believe there's a way for the !hasRefs to be true, so I > suppose it could be removed... > > I'll hold off on pushing to allow pkrempa to make any comments. > > John > > Thanks all! At last we do something correct, that's enough. >>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >>> index f2586a1..8c9622e 100644 >>> --- a/src/qemu/qemu_process.c >>> +++ b/src/qemu/qemu_process.c >>> @@ -150,11 +150,10 @@ qemuProcessHandleAgentEOF(qemuAgentPtr agent, >>> goto unlock; >>> } >>> >>> + qemuAgentClose(agent); >>> priv->agent = NULL; >>> >>> virObjectUnlock(vm); >>> - >>> - qemuAgentClose(agent); >>> return; >>> >>> unlock: >>> >> >> -- >> libvir-list mailing list >> libvir-list@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/libvir-list >> > > . > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list