On Mon, Oct 10, 2016 at 04:03:32PM +0800, Michal Privoznik wrote:
On 07.10.2016 19:52, Martin Kletzander wrote:I don't have any justification for this except an empirical one: with this patch the code from Bug 1363628 doesn't crash after "leaking". I currently don't have properly working valgrind, but I'm working on it and it might shed some light into this (although it might also not happen due to the slowdown). However in the meantime I attempted some analysis and I got even more confused than before, I guess. With this patch the code doesn't crash, even though virObjectEventStateQueueDispatch() properly skips callbacks marked as deleted. What I feel is even more weird, that if I duplicate the purgatory function (instead of moving it), it crashes again, and I feel like even more often. Weirdly-fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1363628 Signed-off-by: Martin Kletzander <mkletzan@xxxxxxxxxx> --- src/conf/object_event.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/conf/object_event.c b/src/conf/object_event.c index e5af4be68a7e..4066b4673b42 100644 --- a/src/conf/object_event.c +++ b/src/conf/object_event.c @@ -821,13 +821,13 @@ virObjectEventStateFlush(virObjectEventStatePtr state) if (state->timer != -1) virEventUpdateTimeout(state->timer, -1); + /* Purge any deleted callbacks */ + virObjectEventCallbackListPurgeMarked(state->callbacks); + virObjectEventStateQueueDispatch(state, &tempQueue, state->callbacks); - /* Purge any deleted callbacks */ - virObjectEventCallbackListPurgeMarked(state->callbacks); - state->isDispatching = false; virObjectEventStateUnlock(state); }Well, I have no idea either, because virObjectEventStateQueueDispatch() calls virObjectEventStateDispatchCallbacks() which in turn calls virObjectEventDispatchMatchCallback() which returns false if the callback->deleted is truem and thus the callback is skipped - just like it is after this patch of yours. This whole bug feels like a refcounting problem and IMO your patch is just making the scenario more unlikely.
So I've found out what it is thanks to Luayo Huang. And if I understand it correctly, it won't happen with stateful drivers. Stateless, however, will have this problem. The problem is that testConnectClose() clears the driver structure, including calling virObjectEventStateFree() on the eventState. It clears the whole eventState, but it can happen while the eventState is dispatching. So there's the race. And it would also explain what I was seeing all the time. qemuCleanup() is the only function in qemu driver that would cleanup the eventState and since that gets called from the daemon only after other stuff is cleaned up, it won't happen there. I'll try turning virObjectEventStatePtr into virObject, we'll have refcounting then and it will not disappear until it's stopped being used. I'll see if that helps or not, but it could. Martin
Attachment:
signature.asc
Description: Digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list