Re: EPIPE

quinet@xxxxxxxxxx (Raphael Quinet) · Wed, 10 May 2000 17:19:11 +0200 (MET DST)

On Wed, 10 May 2000, Michael Natterer <mitschel@xxxxxxxxxxxxxxx> wrote:
> Michael Natterer wrote:
> > 
> > Austin Donnelly wrote:
> > >
> > > [ two mails i totally agree with ]
> > 
> > I'm about to commit some code which should bring the signal
> > stuff into a sane state. The ChangeLog entry is quite verbose
> > and should explain how I tortured the code.
> 
> Um, 1 minute later I found a bad bug. No commit today :(

I don't know if this is relevant (I haven't seen the code that you
wanted to commit), but here are some general comments about signals
and when they are usually triggered in programs similar to the Gimp...

- SIGHUP, SIGINT, SIGQUIT, SIGTERM: usually triggered by the user or
  by the system shutting down.  They can be delivered at any time.
- SIGPIPE: an attempt to write() or send() something on a socket has
  failed because the other party has closed the connection.  This
  signal is usually triggered from within the system call, which is
  often called from inside a high-level function such as printf().
  Since most versions of printf() are not re-entrant, it is usually
  a bad idea to call printf() or any stdio function in the signal
  handler.
- SIGSEGV: just say "Oops!" or "Eek!".  Some bug in the code has
  corrupted the memory.  Usually happens when trying to dereference a
  NULL pointer or a pointer that has been overwritten with garbage.
  This also happens quite often inside printf() or sprintf(), for
  example when you are printing some debugging messages (or error
  messages) and you did not think that some arguments may be NULL.
  For this reason, it is also a bad idea to call any stdio functions
  inside the handler for this signal.
- SIGBUS: mostly a variant of SIGSEGV.  Happens on many processors
  when you are trying to access some unaligned data.  Again, this is
  usually due to pointer corruption.
- SIGILL: one some processors that do not deliver SIGBUS in all cases,
  you can get a SIGILL if a pointer to a callback function was
  overwritten with garbage.  If the pointer is still referencing some
  area inside a code segment (so that you don't get a SIGSEGV) but not
  pointing to the start of a valid instruction, you will get the SIGILL.
  By the way, the Gimp does not catch this one.  Why?
- SIGFPE: usually comes from a division by 0, although some other
  errors (overflow, invalid operand) can also occur.  This is usually
  triggered while executing a floating-point operation, although some
  processors or OS's can delay the signal.
- SIGABRT: usually triggered by the application calling abort() or by a
  user who wants to get a core dump from a running process.  It can be
  caught by an application that wants to perform some specific cleanup
  tasks, but in most cases it should not be caught by a generic error
  handler.  I don't understand why the Gimp maps this to the generic
  gimp_fatal_error() function???
- SIGCHLD or SIGCLD: a child process died.  This signal can be
  delivered at any time.  Some systems do not provide a reliable way
  to know how many processes exited (if they do not support SA_NODEFER
  or if their waitpid() or wait3() calls are broken), so it is usually
  better to simply set a flag in the signal handler (without calling
  any wait*() function) and to check the status of the children outside
  the signal handler, until some wait*() function reports that there
  are no more dead processes.  For example:
    while ((pid = waitpid (-1, &status, WNOHANG)) > 0)
      { ... /* check WIFEXITED(status) and other things */ }

In most of the applications that I wrote, the signal handlers do
nothing directly: they only set a flag that is checked by the main
loop (in an idle function for GTK+ apps, or after poll() or select()
for applications that use low-level calls).  I define one flag for
each signal (got_sigchld, got_sigterm, ...) and a master flag that
tells if any of the signal-specific flags have been set.  Sometimes I
also use counters instead of boolean flags, but on some systems the
counters are not reliable (especially if there is no SA_NODEFER) so
most of the time they are meaningless.

In one application that wanted to catch SIGSEGV, SIGBUS, SIGILL and
SIGFPE, I created a handler that uses a direct call to write() on an
fd that was previously obtained from fileno(stderr) (this fd is saved
early so that the write() call can work even if the FILE *stderr is
overwritten with garbage).  Doing this is safe, AFAIK.

In most cases, I ignore SIGPIPE (or I only increment a counter for
debugging purposes) because the best way to deal with this is to check
the return value of the write() or send() calls, or to check if a
read() returns 0 later.

Just my 0.02 Euro.  But you probably knew all of this already...

-Raphael