On 10.10.2016 21:13, Martin Kletzander wrote: > On Mon, Oct 10, 2016 at 04:03:32PM +0800, Michal Privoznik wrote: >> On 07.10.2016 19:52, Martin Kletzander wrote: >>> I don't have any justification for this except an empirical one: with >>> this patch the code from Bug 1363628 doesn't crash after "leaking". >>> >>> I currently don't have properly working valgrind, but I'm working on it >>> and it might shed some light into this (although it might also not >>> happen due to the slowdown). >>> >>> However in the meantime I attempted some analysis and I got even more >>> confused than before, I guess. With this patch the code doesn't crash, >>> even though virObjectEventStateQueueDispatch() properly skips callbacks >>> marked as deleted. What I feel is even more weird, that if I duplicate >>> the purgatory function (instead of moving it), it crashes again, and I >>> feel like even more often. >>> >>> Weirdly-fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1363628 >>> >>> Signed-off-by: Martin Kletzander <mkletzan@xxxxxxxxxx> >>> --- >>> src/conf/object_event.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/src/conf/object_event.c b/src/conf/object_event.c >>> index e5af4be68a7e..4066b4673b42 100644 >>> --- a/src/conf/object_event.c >>> +++ b/src/conf/object_event.c >>> @@ -821,13 +821,13 @@ virObjectEventStateFlush(virObjectEventStatePtr >>> state) >>> if (state->timer != -1) >>> virEventUpdateTimeout(state->timer, -1); >>> >>> + /* Purge any deleted callbacks */ >>> + virObjectEventCallbackListPurgeMarked(state->callbacks); >>> + >>> virObjectEventStateQueueDispatch(state, >>> &tempQueue, >>> state->callbacks); >>> >>> - /* Purge any deleted callbacks */ >>> - virObjectEventCallbackListPurgeMarked(state->callbacks); >>> - >>> state->isDispatching = false; >>> virObjectEventStateUnlock(state); >>> } >> >> Well, I have no idea either, because virObjectEventStateQueueDispatch() >> calls virObjectEventStateDispatchCallbacks() which in turn calls >> virObjectEventDispatchMatchCallback() which returns false if the >> callback->deleted is truem and thus the callback is skipped - just like >> it is after this patch of yours. >> >> This whole bug feels like a refcounting problem and IMO your patch is >> just making the scenario more unlikely. >> > > So I've found out what it is thanks to Luayo Huang. And if I understand > it correctly, it won't happen with stateful drivers. Stateless, > however, will have this problem. > > The problem is that testConnectClose() clears the driver structure, > including calling virObjectEventStateFree() on the eventState. It > clears the whole eventState, but it can happen while the eventState is > dispatching. So there's the race. And it would also explain what I was > seeing all the time. qemuCleanup() is the only function in qemu driver > that would cleanup the eventState and since that gets called from the > daemon only after other stuff is cleaned up, it won't happen there. > Ah, yeah. I can see that in the code now too. Good job! > I'll try turning virObjectEventStatePtr into virObject, we'll have > refcounting then and it will not disappear until it's stopped being > used. I'll see if that helps or not, but it could. Yep, this is classical example of why we need refcounted objects. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list