On Tue, Sep 11, 2007 at 04:56:13AM -0400, Daniel Veillard wrote: > On Tue, Sep 11, 2007 at 12:59:59AM +0100, Daniel P. Berrange wrote: > > I noticed that when using the SSH tunnel for the remote driver I ended up > > with alot of zombie SSH processes. We simply forgot to waitpid() on the > > child when a connection attempt failed, or when shutting down an open remote > > connection. Attached is a possible patch > > Looks fine, like Rich maybe a bit of refactoring might be good. The > only worries I have is the following scenario: > - the ssh process dies > - libvirt based application takes some time to notice it > - the OS span a new process with the same PID after a PID rollabck > (not completely unlikely since the ssh may have been started a long > time ago) > - we end-up killing a random process in the system > > I think this is mostly avoidable by resetting priv->pid to -1 or 0 on > any child communication error, and before doing the kill in the patch. > Even better would be to be able to check that the process corresponding > to priv->pid is still a child of the current process, I wonder if this > can be achieved without blocking with an initial waitpid() After some testing I believe we can safely do away with kill() completely. If we simply close(priv->sock) which is our end of the socketpair used to talk to SSH, then SSH detects the end-of-file condition and exits of its on accord. So there's no need to kill() - just close the socket and waitpid() seems to do the trickm, unless I'm missing something bad. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
Index: src/remote_internal.c =================================================================== RCS file: /data/cvs/libvirt/src/remote_internal.c,v retrieving revision 1.22 diff -u -p -r1.22 remote_internal.c --- src/remote_internal.c 12 Sep 2007 10:33:48 -0000 1.22 +++ src/remote_internal.c 14 Sep 2007 02:09:38 -0000 @@ -62,6 +62,7 @@ struct private_data { int magic; /* Should be MAGIC or DEAD. */ int sock; /* Socket. */ + pid_t pid; /* PID of tunnel process */ int uses_tls; /* TLS enabled on socket? */ gnutls_session_t session; /* GnuTLS session (if uses_tls != 0). */ char *type; /* Cached return from remoteType. */ @@ -578,7 +579,7 @@ doRemoteOpen (virConnectPtr conn, struct /*FALLTHROUGH*/ case trans_ext: { - int pid; + pid_t pid; int sv[2]; /* Fork off the external process. Use socketpair to create a private @@ -617,6 +618,7 @@ doRemoteOpen (virConnectPtr conn, struct /* Parent continues here. */ close (sv[1]); priv->sock = sv[0]; + priv->pid = pid; } } /* switch (transport) */ @@ -646,6 +648,14 @@ doRemoteOpen (virConnectPtr conn, struct gnutls_bye (priv->session, GNUTLS_SHUT_RDWR); close (priv->sock); } + if (priv->pid > 0) { + pid_t reap; + do { + reap = waitpid(priv->pid, NULL, 0); + if (reap == -1 && errno == EINTR) + continue; + } while (reap != -1 && reap != priv->pid); + } /* Free up the URL and strings. */ xmlFreeURI (uri); @@ -1170,6 +1180,15 @@ doRemoteClose (virConnectPtr conn, struc gnutls_bye (priv->session, GNUTLS_SHUT_RDWR); close (priv->sock); + if (priv->pid > 0) { + pid_t reap; + do { + reap = waitpid(priv->pid, NULL, 0); + if (reap == -1 && errno == EINTR) + continue; + } while (reap != -1 && reap != priv->pid); + } + /* See comment for remoteType. */ if (priv->type) free (priv->type);
-- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list