Re: [PATCH v2 2/2] remote: free client all event callbacks at remoteClientCloseFunc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2017/11/12 22:45, John Ferlan 写道:

On 11/11/2017 03:20 AM, xinhua.Cao wrote:

在 2017/11/11 5:43, John Ferlan 写道:
On 11/06/2017 12:47 AM, xinhua.Cao wrote:
base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve
libvirt coredump problem, but it introduce a memory leak sense:

1. one client register a domain event such as reboot event
2. and client was terminated unexpectly, such as kill -9,
unexpectedly

then this client object will not free at libvirtd service program.

remoteDispatchConnectDomainEventCallbackRegisterAny reference the
client,
but when client was terminated before it call deRegisterAny,the
reference
of client will not reduced to zero. so the memory leak take place.
What's the difference between the path that would terminate normally and
the one that terminates abnormally. It seems to me there might be (an)
extra Ref(s) on the client which prevents virNetServerClientDispose from
calling it's @privateDataFreeFunc which would call remoteClientFreeFunc
since remoteClientInitHook worked properly.

That's what needs to be found.
If client terminate normally, client would call
virConnectDomainEventDeregisterAny to
clear eventCallbacks. Then libvirt daemon will call
remoteDispatchConnectDomainEventDeregisterAny
-> remoteEventCallbackFree -> virObjectUnref(callback->client) to free
eventCallbacks and unRef client.
just think if client don't call  virConnectDomainEventDeregisterAny and
be killed. then there is no way to call call
remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree
-> virObjectUnref(callback->client)
so client's ref can't down to zero. so remoteClientFreeFunc can't be
called. then client object memory leak.
So the client/sample code is not calling Deregister because it has no
"handling" of an abnormal exit. That is no signal handler to call
Deregister in the event some outside force kills the client/sample code
while it's processing.

So a different solution would be to add signal handling to the code in
order to handle the kill and make the Deregister (and close) calls.
signal handler can handle such as kill -15 signal, But can't handle kill -9 signal.
so we also need to resolve it in libvirt
The difference is call remoteDispatchConnectDomainEventDeregisterAny or
don't call remoteDispatchConnectDomainEventDeregisterAny
to Unref(callback->client).

IIUC, the whole reason the Deregister code is in the Free Func is to
"handle" the case where that last client reference was Unref'd, but
someone didn't perform the actual Deregister. The logic predates my
libvirt involvement...

Still because of commit id 'fe8f1c8b' where we generate a REF for the
Register and that's transparent to the consumer (e.g. how would they
know they need to ensure that Deregister is called), thus the purpose of
this patch is to find a way to Deregister if it's determined that the
consumer hasn't by the time of the "last" REF we'd have [trying to write
a commit message here too].

This solution to this problem is to alter the processing to have the
remoteClientCloseFunc handle performing the Deregister calls instead of
the remoteClientFreeFunc because there's no way FreeFunc would be called
unless the Deregister was already called. In fact, if you think about it
if the CloseFunc does the Deregister why would the FreeFunc also include
that call?  IOW: It should only need to be in one or the other.

I know you've already posted patch v3; however, I also think you need to
read commit id '8294aa0c'. If you're moving the Deregister handling to
the close func, then I would think that the "reason" for that commit
still applies.

When you generate a v4, my suggestion is to generate 2 patches.

   1. Just the remoteClientFreePrivateCallbacks separation including the
sysident handling. From my read of the code, the virConnectClose doesn't
have the same issue.
I am unfamiliarity with sysident. then research on it.  It used for polkit check
but I still can't confirm it.
   2. Move the call to remoteClientFreePrivateCallbacks from FreeFunc to
CloseFunc with a commit message that explains why this is necessary
since commit id 'fe8f1c8b'  (BTW: No need to supply the complete hash, 8
significant digits are usually plenty).
this two advices are very useful. thanks very much.

I supplied a bit more details w/r/t what I found for REF/UNREF - I
assume it essentially matches what you've found. But having it here
certainly helps - so thanks for providing it along with the example.

here is my test code

#include<stdio.h>
#include<pthread.h>
#include<libvirt/libvirt.h>

int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event,
         int detail, void *opaque);
void timeOutCallback(int timer, void *opaque);

int run = 1;

int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event,
         int detail, void *opaque)
{
     char uuid[128] = { 0 };
     (void) virDomainGetUUIDString(dom, uuid);
     printf("domain %s come a event %d\n", uuid, event);
}

void _timeOutCallback(int timer, void *opaque)
{
     printf("time out is comming\n");
}

void *event_thread(void *opaque)
{
     int i =0;
     if (virEventRegisterDefaultImpl() < 0)
     {
         return NULL;
     }

     virConnectPtr conn = virConnectOpen("qemu:///system");
     if (conn == NULL)
     {
         return NULL;
     }

     int callback = virConnectDomainEventRegisterAny(conn, NULL,
VIR_DOMAIN_EVENT_ID_LIFECYCLE,
             _DomainStatusCallback, NULL, NULL);
     int timeout = virEventAddTimeout(1000,_timeOutCallback,NULL,NULL);

     while (run && virConnectIsAlive(conn) == 1)
     {
        virEventRunDefaultImpl();
     }

     sleep(5);
     virConnectDomainEventDeregisterAny(conn, callback);
     virConnectClose(conn);
}


int main(int argc, char *argv[])
{
     pthread_t pid;
     int ret = 0;
     ret = pthread_create(&pid, NULL, event_thread, NULL);
     sleep(100);
     run = 0;
     sleep(10);
}

when I don't call remoteClientFreePrivateCallbacks at
remoteClientCloseFunc,
here is client refs change when I kill the client.
(gdb) c
Continuing.
Hardware watchpoint 6: *(int *) 0x5637edc888f4

Old value = 4
New value = 5
virObjectRef (anyobj=anyobj@entry=0x5637edc888f0) at util/virobject.c:296
296         PROBE(OBJECT_REF, "obj=%p", obj);
(gdb) bt
#0  virObjectRef (anyobj=anyobj@entry=0x5637edc888f0) at
util/virobject.c:296
#1  0x00005637ec61d643 in
remoteDispatchConnectDomainEventCallbackRegisterAny (server=<optimized
out>, msg=<optimized out>,
     ret=0x7f054c0008e0, args=0x7f054c0008c0, rerr=0x7f057ba6cc90,
client=0x5637edc888f0) at remote.c:4418
Although the line numbers are different with top of tree, this is the
RegisterAny REF on the client...

[...]

So that says to me we generally run with 3-4 REFs (without any Register
Event occurring).  AFAICT, the following is the list:

    1. Adding to the srv->clients (virNetServerAddClient)
    2. Adding IO Callback (virNetServerClientRegisterEvent)
    3. Adding Keep Alive (virNetServerClientInitKeepAlive and
                          virKeepAliveStart)

The 4th is a bit trickier, but I believe it's REF'd during the call to
virNetServerClientDispatchRead...

Old value = 5
New value = 4
virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
259         PROBE(OBJECT_UNREF, "obj=%p", obj);
Continuing.
[Switching to Thread 0x7f058eb1f840 (LWP 3878)]
Hardware watchpoint 9: *(int *) 0x5637edc88634
I'm assuming this comes from the opposite end of the DispatchRead -
there's no trace here to prove it, but that's not necessarily surprising
either as it's not easy to find where the Unref would naturally occur
without an error path occurring....

Old value = 4
New value = 5
virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296
296         PROBE(OBJECT_REF, "obj=%p", obj);
#0  virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at
util/virobject.c:296
#1  0x00005637ec65299d in virNetServerClientClose
(client=0x5637edc88630) at rpc/virnetserverclient.c:978
[...]

Old value = 5
New value = 4
virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259
259         PROBE(OBJECT_UNREF, "obj=%p", obj);
#0  virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at
util/virobject.c:259
#1  0x00005637ec6529bd in virNetServerClientClose
(client=0x5637edc88630) at rpc/virnetserverclient.c:982
This pair of REF/UNREF in virNetServerClientClose when client->keepalive
is set.

[...]

Old value = 4
New value = 5
virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296
296         PROBE(OBJECT_REF, "obj=%p", obj);
#0  virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at
util/virobject.c:296
#1  0x00005637ec6529d1 in virNetServerClientClose
(client=0x5637edc88630) at rpc/virnetserverclient.c:987
[...]

Old value = 5
New value = 4
virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259
259         PROBE(OBJECT_UNREF, "obj=%p", obj);
#0  virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at
util/virobject.c:259
#1  0x00005637ec6529ee in virNetServerClientClose
(client=0x5637edc88630) at rpc/virnetserverclient.c:991
This pair of REF/UNREF in virNetServerClientClose when
client->privateDataCloseFunc is set.

[...]

Old value = 4
New value = 3
virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259
259         PROBE(OBJECT_UNREF, "obj=%p", obj);
#0  virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at
util/virobject.c:259
#1  0x00005637ec650705 in virNetServerProcessClients
(srv=0x5637edc6bf10) at rpc/virnetserver.c:873
Again my line numbers are off a bit, but this would be the for the UNREF
after VIR_DELETE_ELEMENT when removed from srv->clients

[...]

Old value = 3
New value = 2
virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
259         PROBE(OBJECT_UNREF, "obj=%p", obj);
#0  virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
#1  0x00007f058de4199b in virObjectUnref (anyobj=<optimized out>) at
util/virobject.c:265
#2  0x00007f058de0db34 in virEventPollCleanupTimeouts () at
util/vireventpoll.c:543
[...]

This one is the opposite end of the virObjectRef done for the keep alive
timer. See virNetServerClientInitKeepAlive and virKeepAliveStart usage
of virEventAddTimeout.

Old value = 2
New value = 1
virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
259         PROBE(OBJECT_UNREF, "obj=%p", obj);
#0  virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
#1  0x00005637ec656bee in virNetSocketEventFree
(opaque=opaque@entry=0x5637edc8a5b0) at rpc/virnetsocket.c:2152
#2  0x00007f058de0dd34 in virEventPollCleanupHandles () at
util/vireventpoll.c:592
[...]

This one is the opposite end of the virObjectRef done in
virNetServerClientRegisterEvent prior to virNetSocketAddIOCallback which
calls virEventAddHandle.

Theoretically the "last" REF...

John
Continuing.

so the client refs alway have 1 when I register one event callbacks.
this patch will deRegister all event callbacks when client was close.
---
FWIW: This commit message was a bit difficult to follow - I know it's
primarily a language barrier thing - so maybe if you just construct some
function code flow for both initialization/open and destruction/close
it'd help... Paths that work and the path that you think is broken...
That can be done in this section after the "---" and before the changed
API's.

I assume you've done a lot of research, but the details of that research
can be difficult to just "pick up on".  Typically someone will provide
some sort of valgrind output as an example.

I just have a sense that this could boil down to a path (or two) that
are doing the virObjectRef without the expected virObjectUnref occurring
- something perhaps to do with how @wantClose is handled in the abnormal
termination case. Nothing jumps out at me though even after looking at
the code for a while.


   daemon/remote.c | 46 ++++++++++++++++++++++++++++------------------
   1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/daemon/remote.c b/daemon/remote.c
index cbcb6e8..466b2e7 100644
--- a/daemon/remote.c
+++ b/daemon/remote.c
@@ -1689,23 +1689,10 @@ void
remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED,
int r
           neventCallbacks = 0; \
       } while (0);
   -/*
- * You must hold lock for at least the client
- * We don't free stuff here, merely disconnect the client's
- * network socket & resources.
- * We keep the libvirt connection open until any async
- * jobs have finished, then clean it up elsewhere
- */
-void remoteClientFreeFunc(void *data)
+static void
+remoteClientFreePrivateCallbacks(struct daemonClientPrivate *priv)
   {
-    struct daemonClientPrivate *priv = data;
-
-    /* Deregister event delivery callback */
-    if (priv->conn) {
-        virIdentityPtr sysident = virIdentityGetSystem();
-
-        virIdentitySetCurrent(sysident);
-
+    if (priv && priv->conn) {
Well each of the callers has already dereferenced @priv so it's a bit
too late to check it now!
yes, I also think this check is too late. this placement can only check
priv->conn.
           DEREG_CB(priv->conn, priv->domainEventCallbacks,
                    priv->ndomainEventCallbacks,
                    virConnectDomainEventDeregisterAny, "domain");
@@ -1730,16 +1717,38 @@ void remoteClientFreeFunc(void *data)
remoteRelayConnectionClosedEvent) < 0)
                   VIR_WARN("unexpected close callback event
deregister failure");
           }
+    }
+}
+#undef DEREG_CB
+
There should be another blank line here.
OK, I lost a blank line.
+/*
+ * You must hold lock for at least the client
+ * We don't free stuff here, merely disconnect the client's
+ * network socket & resources.
+ * We keep the libvirt connection open until any async
+ * jobs have finished, then clean it up elsewhere
+ */
+void remoteClientFreeFunc(void *data)
+{
+    struct daemonClientPrivate *priv = data;
+
+    /* Deregister event delivery callback */
+    if (priv->conn) {
+        virIdentityPtr sysident = virIdentityGetSystem();
+
+        virIdentitySetCurrent(sysident);
+
+        remoteClientFreePrivateCallbacks(priv);
             virConnectClose(priv->conn);
             virIdentitySetCurrent(NULL);
           virObjectUnref(sysident);
-    }
   +    }
       VIR_FREE(priv);
+
   }
-#undef DEREG_CB
       static void remoteClientCloseFunc(virNetServerClientPtr client)
@@ -1747,6 +1756,7 @@ static void
remoteClientCloseFunc(virNetServerClientPtr client)
       struct daemonClientPrivate *priv =
virNetServerClientGetPrivateData(client);
         daemonRemoveAllClientStreams(priv->streams);
+    remoteClientFreePrivateCallbacks(priv);
Perhaps this just gets a "if (priv->conn)" before calling...

Still if as I assume the missing Unref is found (someday), then
theoretically remoteClientFreeFunc would be called - of course the
nEventCallbacks = 0 will ensure the Dereg's dont' get called again. So
it shoudn't be a problem
In my opinion, it is safe to check point is Null. But check everywhere
is stupid.
so it is hard to determine where need check.:-)
John

   }
.


.



--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]
  Powered by Linux