On Tue, Sep 11, 2007 at 12:59:59AM +0100, Daniel P. Berrange wrote: > I noticed that when using the SSH tunnel for the remote driver I ended up > with alot of zombie SSH processes. We simply forgot to waitpid() on the > child when a connection attempt failed, or when shutting down an open remote > connection. Attached is a possible patch Looks fine, like Rich maybe a bit of refactoring might be good. The only worries I have is the following scenario: - the ssh process dies - libvirt based application takes some time to notice it - the OS span a new process with the same PID after a PID rollabck (not completely unlikely since the ssh may have been started a long time ago) - we end-up killing a random process in the system I think this is mostly avoidable by resetting priv->pid to -1 or 0 on any child communication error, and before doing the kill in the patch. Even better would be to be able to check that the process corresponding to priv->pid is still a child of the current process, I wonder if this can be achieved without blocking with an initial waitpid() Maybe I'm too cautious, I'm fine with the principle of the patch though > @@ -646,6 +648,19 @@ doRemoteOpen (virConnectPtr conn, struct > gnutls_bye (priv->session, GNUTLS_SHUT_RDWR); > close (priv->sock); > } > + if (priv->pid > 0) { > + pid_t reap; > + int status, n = 0; > + kill(priv->pid, SIGTERM); > + do { > + if (n) > + usleep(n*1000); > + if (n > 3) > + kill(priv->pid, SIGKILL); > + reap = waitpid(priv->pid, &status, WNOHANG); > + n++; > + } while (reap != -1 && reap != priv->pid); > + } > > /* Free up the URL and strings. */ > xmlFreeURI (uri); Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@xxxxxxxxxx | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list