Re: EPIPE

Michael Natterer <mitschel@xxxxxxxxxxxxxxx> · Thu, 11 May 2000 20:54:44 +0200

Raphael Quinet wrote:
> 
> On Thu, 11 May 2000, Michael Natterer <mitschel@xxxxxxxxxxxxxxx> wrote:
> >
> > This is what currently happens (ok, it happens in the handler, but WNOHANG
> > *should* be absolutely safe).
> 
> No, actually it is not safe on all operating systems: as I wrote
> elsewhere, you cannot always rely on SA_NODEFER.  This means that in
> some cases, you could miss a SIGCHLD signal that occurs while you are
> still inside the handler but after the last test on waitpid().  If
> this happens, the main app will not see that one of the plug-ins has
> died (until another one dies and the handler collects the status for
> both).  That's why it is safer to make the tests outside the signal
> handler.  Otherwise, you could have a race condition on some systems
> (very seldom, but still...).

We don't use SA_NODEFER any more. And AFAIK the delivery of SIGCHLD has
nothing to do with cleaning up zombies. This is why we loop around
waitpid() because POSIX explicitly says that signals arriving close
together may be merged by the kernel.

> [...]
> > The usage of SIGCLD is strongly discouraged by Stevens and some Solaris
> > document I fould recently. But Gimp uses SIGCHLD anyway.
> 
> And here is an excerpt from /usr/include/sys/signal.h on Solaris 2.6:
> 
> #define SIGCLD  18      /* child status change */
> #define SIGCHLD 18      /* child status change alias (POSIX) */
> 
> So it does not make much of a difference under recent versions of
> Solaris, at least... :-)  But they still say in some docs that the
> behavior of SIGCLD might change in future releases, so the POSIX
> version should be used in new programs.

Yeah, although Solaris is definitely more POSIX than Linux, they have
to cook their own stuff :) Stevens calls SIGCLD a "questionable practice".

> > > In one application that wanted to catch SIGSEGV, SIGBUS, SIGILL and
> > > SIGFPE, I created a handler that uses a direct call to write() on an
> > > fd that was previously obtained from fileno(stderr) (this fd is saved
> > > early so that the write() call can work even if the FILE *stderr is
> > > overwritten with garbage).  Doing this is safe, AFAIK.
> >
> > Yep, write() is safe. Gimp uses g_print() which is not really safe, but
> > then we call g_on_error_query() which definitely does a bit more than
> > what's allowed :)
> 
> Yes...  I wrote a few months ago that I would change that and implement
> some kind of --enable-stack-trace option, but I never took the time to
> do it.

Now it's there :) We just have to convert the remaining g_print() to write()
and the handler will be totally safe if enable_stack_trace == FALSE.

> > >From glib/gerror.c:
> >
> > /*
> >  * MT safe ; except for g_on_error_stack_trace, but who wants thread safety
> >  * then
> >  */
> 
> Note that being MT safe is not enough.  For the 4 signals that are
> listed above, you can usually expect that your memory is already
> corrupted.  So if you want to minimize the risks of crashing
> recursively inside the signal handler, you should avoid using
> variables as much as possible.  A handler for SIGSEGV is a good place
> for paranoia...

Like getting another SIGSEGV while glib asks if it should do a stack trace ?-)
That's e-v-i-l. But I'm afraid we either have a waterproof SIGSEGV handler
or the trace, not both :(

We should probably re-add the reentrancy guards in the fatal handlers and
just do a brute force exit() if it's called recursively (which can only happen
during the stack trace because that's the only case where the signals are
unblocked in the handler).

> The other signal handlers do not need so much defensive programming.
> Being MT safe is usually enough.

I think so, too. The SIGCHLD handler is so trivial that it can hardly
cause any harm.

bye,
--Mitch