Re: PATCH: Prevent zombie ssh tunnels

"Daniel P. Berrange" <berrange@xxxxxxxxxx> · Wed, 12 Sep 2007 04:31:23 +0100

On Tue, Sep 11, 2007 at 12:00:33PM +0100, Richard W.M. Jones wrote:
> Daniel Veillard wrote:
> >On Tue, Sep 11, 2007 at 11:35:46AM +0200, Gerd Hoffmann wrote:
> >>Daniel Veillard wrote:
> >>>   - the ssh process dies
> >>>   - libvirt based application takes some time to notice it
> >>>   - the OS span a new process with the same PID after a PID rollabck
> >>Can not happen as long as libvirt hasn't asked for the exist status via
> >>waitpid() because the pid is still in use by the zombie ssh process.
> >
> >  Hum, which is precisely why we need the patch. Still I would feel a bit
> >better if we could check that priv->pid is a child of the current process
> >something like (getppid(priv->pid) == getpid()) test before any kill would
> >do this easilly I think.
> 
> I think Gerd's point was that as long as we haven't waited for the PID 
> within this process before, the PID cannot be reused.

AFAIK there is no API to give you the parent PID of an arbitrary PID. The
getppid() call returns your own parent - you can't ask it for someone
else's parent.

> That doesn't mean the situation cannot arise -- for example the main 
> program might be using other libraries as well as libvirt, and those 
> other libraries might blindly wait(2) for children.

There is an issue if the app has set SIGCHLD to SIG_IGN - the kernel will
automatically reap zombies then. This would allow the race that Daniel
illustrates above, where we might 'kill' a program that is no longer our
own SSH client.

In the case of cleaning up after a failed doRemoteOpen call we should be
safe enough, since we only spawned the SSH process 10 lines higher up and
the system was have to be insanely busy to cycle through 65536 PIDs before
we completed those 10 lines.

In the case of doRemoteClose we've not got alot of good options. Either
take the risk that SIGCHILD is SIG_IGN or someone else called wait()
and do the kill() anyway. 

Another option is to double-fork() when running the SSH tunnel so it gets
inherited by init. Then we assume that SSH will die & exit when we close
the socket in doRemoteClose.  ie closing our end of the socket should make
SSH get a SIGPIPE / EOF and exit - or equivalently if the server closes 
its end.

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 

--
Libvir-list mailing list
Libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list