Not sure you followed the exact meaning from the hacking page - that's an awfully long commit message string. It should be a single line of up to about 70 chars with "details" in the commit message itself somewhat like the original patch, e.g.: "remote: Fix use after free reference for connection pointer" followed by a "solid summary" for what's changing in the commit message. There's many examples... Those details should contain enough information so that someone reviewing won't need to attempt to entirely replicate the research you did to discover/diagnose the problem. For these types of "edge conditions" involving multiple references to objects and the order of free or something really complicated I'm always in favor of adding as much detail as you can recall to provide enough of a frame of reference for the reviewer to be able to paint the same picture in their mind as you have (now) in yours. Not only is it really helpful, but it makes review/triage a whole lot easier. In particular some details how you discovered the bug, how to reproduce, and why you feel the fix is where you put it. Such as, what setup and path do you feel closed the connection before and what shreds of evidence are there. Also your patch and any attached traces should be done against the top of the git HEAD... I don't believe that's the case here since the line numbers in your trace don't necessarily match up to the current HEAD. This isn't an area of the code that I know well, so providing some specifics would have helped this reader understand the scope of the problem. I'm still not sure what's being fixed. I can chase the "normal" close paths, but it's not clear what abnormal path you're attempt to fix. I did do a bit of research, at first I focused too much on the virConnectPtr adjustments, but eventually I started digging into the 'client' and callback/event/state queues and I think I made a discovery described in the stack trace. On 03/08/2017 01:39 AM, xinhua.Cao wrote: > #0 0x00007fda223d66d8 in virClassIsDerivedFrom (klass=0xdeadbeef, parent=0x7fda24c81b40) at util/virobject.c:169 > #1 0x00007fda223d6a1e in virObjectIsClass (anyobj=anyobj@entry=0x7fd9e575b400, klass=<optimized out>) at util/virobject.c:365 > #2 0x00007fda223d6a44 in virObjectLock (anyobj=0x7fd9e575b400) at util/virobject.c:317 > #3 0x00007fda22507f71 in virNetServerClientSendMessage (client=client@entry=0x7fd9e575b400, msg=msg@entry=0x7fd9ec30de90) > at rpc/virnetserverclient.c:1422 This is where things fall apart - virNetServerClientSendMessage takes a 'client' pointer of virNetServerClientPtr. It seems by the time you get to that point the 'client' is already free'd (or virObjectUnref()'d) However, the code you change in remoteClientFreeFunc (and remoteClientCloseFunc) takes a daemonClientPrivate, so I'm missing the connection with the changes you made. It would seem you need to find the needle in the haystack where the virNetServerClientPtr is disposed perhaps improperly or too early. Perhaps some code that doesn't do a virObjectRef() when it assigns the client to something else. Code such as in remote.c where the following occurs (multiple times): if (VIR_ALLOC(callback) < 0) goto cleanup; callback->client = client; Which could be considered similar to how the virConnectPtr is handled in virObjectEventCallbackListAddID where there's a : if (VIR_ALLOC(cb) < 0) goto cleanup; cb->conn = virObjectRef(conn); which corresponds to the virObjectUnref()'s done later for conn. If this is the 'real' fix, then 1. Ensure to virObjectUnref(callback->client) if we fail to append or register the callback 2. Modify remoteEventCallbackFree to virObjectUnref(calback->client) the client (of course you'll need to define variable such as "daemonClientEventCallbackPtr callback = opaque;" too. In any case, since I'd already done a review pass on the code before the type A engineer in me kicked in to try and find out what the problem was - I'll leave that in the following sections. Perhaps it'll help you out in future libvirt patch submissions.... > #4 0x00007fda230d714d in remoteDispatchObjectEventSend (client=0x7fd9e575b400, program=0x7fda24c844e0, procnr=348, > proc=0x7fda2310e5e0 <xdr_remote_domain_event_callback_tunable_msg>, data=0x7ffc3857fdb0) at remote.c:3803 > #5 0x00007fda230dd71b in remoteRelayDomainEventTunable (conn=<optimized out>, dom=0x7fda27cd7660, params=0x7fda27f3aae0, nparams=1,opaque=0x7fd9e6c99e00) at remote.c:1033 > #6 0x00007fda224484cb in virDomainEventDispatchDefaultFunc (conn=0x7fda27cd0120, event=0x7fda2736ea00, > cb=0x7fda230dd610 <remoteRelayDomainEventTunable>, cbopaque=0x7fd9e6c99e00) at conf/domain_event.c:1910 > #7 0x00007fda22446871 in virObjectEventStateDispatchCallbacks (callbacks=<optimized out>, callbacks=<optimized out>, event=0x7fda2736ea00,state=0x7fda24ca3960) at conf/object_event.c:722 > #8 virObjectEventStateQueueDispatch (callbacks=0x7fda24c65800, queue=0x7ffc3857fe90, state=0x7fda24ca3960) at conf/object_event.c:736 > #9 virObjectEventStateFlush (state=0x7fda24ca3960) at conf/object_event.c:814 > #10 virObjectEventTimer (timer=<optimized out>, opaque=0x7fda24ca3960) at conf/object_event.c:560 > #11 0x00007fda223ae8b9 in virEventPollDispatchTimeouts () at util/vireventpoll.c:458 > #12 virEventPollRunOnce () at util/vireventpoll.c:654 > #13 0x00007fda223ad1d2 in virEventRunDefaultImpl () at util/virevent.c:314 > #14 0x00007fda225046cd in virNetDaemonRun (dmn=0x7fda24c775c0) at rpc/virnetdaemon.c:818 > #15 0x00007fda230d6351 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1623 > > Let's clean all event callback when client close. > > Signed-off-by: xinhua.Cao <caoxinhua@xxxxxxxxxx> > > this patch resolved the bug which reported by Wang King. and the prior patch link is http://www.redhat.com/archives/libvir-list/2017-February/msg00452.html > --- > daemon/remote.c | 60 +++++++++++++++++++++++++++------------------------------ > 1 file changed, 28 insertions(+), 32 deletions(-) > While this does apply with git am -3, it didn't with just 'git am' which says to me you're not sending a patch based on git HEAD. It's not like daemon/remote.c changes that much - although Jan 5 there was a commit which certain alters the line numbers above by a bit... > diff --git a/daemon/remote.c b/daemon/remote.c > index 4aa43c223..22330307f 100644 > --- a/daemon/remote.c > +++ b/daemon/remote.c > @@ -100,6 +100,7 @@ static void make_nonnull_node_device(remote_nonnull_node_device *dev_dst, virNod > static void make_nonnull_secret(remote_nonnull_secret *secret_dst, virSecretPtr secret_src); > static void make_nonnull_nwfilter(remote_nonnull_nwfilter *net_dst, virNWFilterPtr nwfilter_src); > static void make_nonnull_domain_snapshot(remote_nonnull_domain_snapshot *snapshot_dst, virDomainSnapshotPtr snapshot_src); > +static void remoteClientCleanEventCallbacks(struct daemonClientPrivate *priv); No need for a forward declaration in this case since the static function would be declared first. > > static int > remoteSerializeDomainDiskErrors(virDomainDiskErrorPtr errors, > @@ -1389,24 +1390,15 @@ void remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED, int r > &msg); > } > > -/* > - * You must hold lock for at least the client > - * We don't free stuff here, merely disconnect the client's > - * network socket & resources. > - * We keep the libvirt connection open until any async > - * jobs have finished, then clean it up elsewhere > - */ > -void remoteClientFreeFunc(void *data) > +static void remoteClientCleanEventCallbacks(struct daemonClientPrivate *priv) The "newer" style of function would be: static void remoteClientCleanEventCallbacks(struct daemonClientPrivate *priv) { } If you find this important, then doing something like this should have been a separate patch prior to the one that you fixes the bug seen... That is one that does nothing more than create a separate function or perhaps in this case a C macro do obfuscate those repeated lines for each event list. We call these code motion or refactoring patches. They need a very short summary just indicating that you're refactoring code. > { > - struct daemonClientPrivate *priv = data; > - > /* Deregister event delivery callback */ > - if (priv->conn) { > - virIdentityPtr sysident = virIdentityGetSystem(); > + if (priv && priv->conn) { This type of check could be in the caller(s)... That would alter your indention too... > size_t i; > - > + virIdentityPtr sysident = virIdentityGetSystem(); > virIdentitySetCurrent(sysident); > > + virMutexLock(&priv->lock); I can certainly see why it's felt this might be important, but since each loop calls virConnectDomainEventDeregisterAny which ends up locking in the remote* functions, I'm not sure they're necessary. I would be concerned over deadlock once the vir* function is called. > for (i = 0; i < priv->ndomainEventCallbacks; i++) { > int callbackID = priv->domainEventCallbacks[i]->callbackID; > if (callbackID < 0) { > @@ -1420,6 +1412,7 @@ void remoteClientFreeFunc(void *data) > VIR_WARN("unexpected domain event deregister failure"); > } > VIR_FREE(priv->domainEventCallbacks); > + priv->ndomainEventCallbacks = 0; You've added these "= 0" for domain, network, and qemu events, but not the others. Any special reason why the others were not included? Regardless as I look through the code - register and deregister functions seem to properly incr/decr the value - so I'm not sure this is necessary especially since the 'priv' gets VIR_FREE()'d shortly. FWIW: If there's a path you find that doesn't decr properly, then that would need to be a separate patch. > > for (i = 0; i < priv->nnetworkEventCallbacks; i++) { > int callbackID = priv->networkEventCallbacks[i]->callbackID; > @@ -1435,21 +1428,7 @@ void remoteClientFreeFunc(void *data) > VIR_WARN("unexpected network event deregister failure"); > } > VIR_FREE(priv->networkEventCallbacks); > - > - for (i = 0; i < priv->nstorageEventCallbacks; i++) { > - int callbackID = priv->storageEventCallbacks[i]->callbackID; > - if (callbackID < 0) { > - VIR_WARN("unexpected incomplete storage pool callback %zu", i); > - continue; > - } > - VIR_DEBUG("Deregistering remote storage pool event relay %d", > - callbackID); > - priv->storageEventCallbacks[i]->callbackID = -1; > - if (virConnectStoragePoolEventDeregisterAny(priv->conn, > - callbackID) < 0) > - VIR_WARN("unexpected storage pool event deregister failure"); > - } > - VIR_FREE(priv->storageEventCallbacks); ?? What happened to the storage event callbacks? They cannot just disappear. > + priv->nnetworkEventCallbacks = 0; > > for (i = 0; i < priv->nqemuEventCallbacks; i++) { > int callbackID = priv->qemuEventCallbacks[i]->callbackID; > @@ -1465,28 +1444,45 @@ void remoteClientFreeFunc(void *data) > VIR_WARN("unexpected qemu monitor event deregister failure"); > } > VIR_FREE(priv->qemuEventCallbacks); > + priv->nqemuEventCallbacks = 0; > > if (priv->closeRegistered) { > if (virConnectUnregisterCloseCallback(priv->conn, > remoteRelayConnectionClosedEvent) < 0) > VIR_WARN("unexpected close callback event deregister failure"); > } > - > - virConnectClose(priv->conn); > - > + virMutexUnlock(&priv->lock); > virIdentitySetCurrent(NULL); > virObjectUnref(sysident); > } > +} > + > + > +/* > + * You must hold lock for at least the client > + * We don't free stuff here, merely disconnect the client's > + * network socket & resources. > + * We keep the libvirt connection open until any async > + * jobs have finished, then clean it up elsewhere > + */ > +void remoteClientFreeFunc(void *data) And this would be: void remoteClientFreeFunc(void *data) > +{ > + struct daemonClientPrivate *priv = data; > + if (!priv || !priv->conn) > + return; If I check the caller from a "normal" path: virEventPollRunOnce => virEventPollCleanupHandles => virNetSocketEventFree => virObjectUnref => virNetServerClientDispose I see that virNetSocketEventFree would set sock->opaque to NULL, so what other path(s) would call into here where perhaps the *data argument wouldn't be reset to NULL. IOW: How would data == NULL at this point? > + > + remoteClientCleanEventCallbacks(priv); > + virConnectClose(priv->conn); > > VIR_FREE(priv); > } > > - Policy is two blank lines between functions, so don't remove this one. > static void remoteClientCloseFunc(virNetServerClientPtr client) > { > struct daemonClientPrivate *priv = virNetServerClientGetPrivateData(client); > > daemonRemoveAllClientStreams(priv->streams); > + remoteClientCleanEventCallbacks(priv); Putting this here is what led me down the path of why would moving this make a difference. Since remoteClientFreeFunc is essentially called after this (in the normal path at least) - sure moving the cleanup here could be a solution, but it could also be masking the "root cause". Besides, this code has been this way for *ages* - changing something like this always requires extra scrutiny as I have found out through much trial and error that there's a reason for "current" format. In this case, there's processing in virNetDaemonRun between the virHashForEach for the daemonServerProcessClients (which could call this function) and the virEventRunDefaultImpl/virEventPollRunOnce code first call to virEventPollCleanupHandles which ends up calling virNetServerClientDispose. Something tells me that processing is important; otherwise, when first developed the event cleanup would have been put here (see commit id 'df0b57a95'). In any case, in light of the virObjectRef(client) - I don't think this is necessary. Looking forward to the next series of patches... John > } > > > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list