On 4/2/19 10:45 AM, Christian Ehrhardt wrote: > On Mon, Apr 1, 2019 at 4:35 PM Michal Privoznik <mprivozn@xxxxxxxxxx> wrote: >> >> On 4/1/19 4:25 PM, Christian Ehrhardt wrote: >>> Hi, >>> I happened to analyze a bug [1] report I got from a friend and for >>> quite a while it was rather elusive. But I now finally got it >>> reproducible [2] enough to share it with the community. >>> >>> The TL;DR of what I see is: >>> - an automation with python-libvirt gets a SIGINT >>> - cleanup runs destroy and further undefine >>> - the guest closes FDs due to SIGINT and/or destroy which triggers >>> daemonStreamHandleAbort >>> - those two fight over the lock >>> >>> There I get libvirtd into a deadlock which ends up with all threads >>> dead [4] and two of them fighting [3] (details) in particular. >>> >>> The to related stacks summarized are like: >>> >>> daemonStreamHandleWrite (failing to write) >>> -> daemonStreamHandleAbort (closing things and cleaning up) >>> -> ... virChrdevFDStreamCloseCb >>> virMutexLock(&priv->devs->lock); >>> >>> # there is code meant to avoid such issues emitting "Unable to close" >>> if a lock is held >>> # but the log doesn't show this triggering with debug enabled >>> >>> #10 seems triggered via an "undefine" call >>> remoteDispatchDomainUndefine >>> ... -> virChrdevFree >>> ... -> virFDStreamSetInternalCloseCb >>> -> virObjectLock(virFDStreamDataPtr fdst) >>> -> virMutexLock(&obj->lock); >>> # closing all streams of a guest (requiring the same locks) >>> >>> While that already feels quite close I struggle to see where exactly >>> we'd want to fix it. >>> But finally having a repro-script [2] I hope that someone else here >>> might be able to help me with that. >>> >>> After all it is a race - on my s390x system it triggers usually <5 >>> tries, while on x86 I have needed up to 18 runs of the test to hang. >>> Given different system configs it might be better or worse for you. >>> >>> FYI we hit this with libvirt 4.0 initially but libvirt 5.0 was just the same. >>> I haven't built 5.1 or a recent master, but the commits since 5.0 >>> didn't mention any issue that seems related. OTOH I'm willing and able >>> to build and try suggestions if anyone comes up with ideas. >>> >>> [1]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096 >>> [2]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/+attachment/5251655/+files/test4.py >>> [3]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/comments/3 >>> [4]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/comments/17 >>> >> >> You may want to look at d63c82df8b11b583dec8e72dfb216d8c14783876 >> (contained in 5.1.0) beause this smells like the issue you're facing. > > Thanks Michal, > I agree that this appears to be similar. > But unfortunately with 5.0 + the full 9 patch series leading into d63c82df still > triggers the deadlock that we found. > So it seems to be a new issue :-/ > > As I said before - any further suggestions (on commits to test and/or > how to resolve with new changes) are welcome. > Thanks in advance! OKay, how about this then: diff --git a/src/util/virfdstream.c b/src/util/virfdstream.c index 1bc43e20a1..0c0bb3ae78 100644 --- a/src/util/virfdstream.c +++ b/src/util/virfdstream.c @@ -680,6 +680,9 @@ static int virFDStreamCloseInt(virStreamPtr st, bool streamAbort) { virFDStreamDataPtr fdst; + virFDStreamInternalCloseCb icbCb; + void *icbOpaque; + virFDStreamInternalCloseCbFreeOpaque icbFreeOpaque; virStreamEventCallback cb; void *opaque; int ret; @@ -730,10 +733,17 @@ virFDStreamCloseInt(virStreamPtr st, bool streamAbort) /* call the internal stream closing callback */ if (fdst->icbCb) { - /* the mutex is not accessible anymore, as private data is null */ - (fdst->icbCb)(st, fdst->icbOpaque); - if (fdst->icbFreeOpaque) - (fdst->icbFreeOpaque)(fdst->icbOpaque); + icbCb = fdst->icbCb; + icbOpaque = fdst->icbOpaque; + icbFreeOpaque = fdst->icbFreeOpaque; + + virObjectUnlock(fdst); + + (icbCb)(st, icbOpaque); + if (icbFreeOpaque) + (icbFreeOpaque)(icbOpaque); + + virObjectLock(fdst); } if (fdst->dispatching) { Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list