Re: EPIPE

Austin Donnelly <austin@xxxxxxxx> · Tue, 9 May 2000 12:44:41 +0100 (BST)

On Monday, 8 May 2000, Michael Natterer wrote:

> Unfortunately this is not the reason why gimp dies on just any aborting
> child. Although I 100% agree that SIGPIPE being fatal is the wrong thing
> to do. I browsed CVS and Gimp is connecting SIGPIPE to on_signal() since
> the beginning of CVS time.

Ok, fair enough.  But gimp used to be able to survive its plugins
dying in the past.  I'm not sure when this was last working.

> There is also an infinite loop using 30% of my cpu time in all cases where
> a dying plugin does _not_ kill gimp.

Fvwm2 used to have a bug very similar to this.  It turned out to be
trying to do too much processing in some signal hander function.

> [Mitch on temp procs not working from plugins]

This seems to be a separate, probably unrelated, bug.

On Monday, 8 May 2000, Tim Mooney wrote:

> The SIGPIPE problem is because on_signal is currently treating it as a
> fatal signal (see the case on_signal in app/main.c).  The on_signal routine
> should probably be modified to not treat SIGPIPE as fatal.  That should fix
> the problem Austin is seeing (that others will no doubt see too).  Someone
> should investigate the handler in 1.1.19 or earlier, and see what was being
> done on SIGPIPE there.

On Monday, 8 May 2000, Tim Mooney wrote:

> So why is Austin observing the problem now?  I'm not sure.

I'm noting that 2-3 recent bug reports have been related to plugins
crashing (the main reason the bug was filed) but as a side effect, the
main gimp process is also dying.  I happened to look into one of these
very briefly, and noted the strange SIGPIPE -> on_signal() -> SIGSEGV
behaviour.

> It does seem
> like it's behaving as expected based on the code.  The thing to try would
> be to use a different signal handler for SIGPIPE, that doesn't terminate
> the gimp.  Perhaps just ignore the signal and let the plug-in protocol detect
> the problem (assuming it can?)?

What happens if we use SIGIGN as the handler for SIGPIPE.  That,
combined with using EPIPE from g_io_channel_{read,write}() should give
us everything we need, without needing the overly complex gimp_nanny()
stuff being proposed earlier.  Remember: KISS - Keep It Simple, Stupid.

On Monday, 8 May 2000, Michael Natterer wrote:

> Nope, it unfortunately has another reason. I can't explain why it used to
> work with SIGPIPE being fatal but the diffs of app/main.c show no semantical
> changes at all down to CVS version 1.1.

So, maybe we never used to get SIGPIPEs raised before.  Maybe we were
just lucky.  I still think we need to deal with them properly.

> On Monday, 8 May 2000, Tim Mooney wrote:
> > Austin is also correct that calling printf from the handler is probably
> > asking for trouble.  If a message must be written, some other method should
> > be found (write *is* ok from a signal handler, but won't using *that* be
> > fun...).
> 
> Hm, I guess printf from a signal handler which aborts the program should be
> totally safe

No it is not.  Consider a libc implementation where printf takes out a
mutex while appending to the stdio output buffer.  If the signal is
delivered while printf has this lock held, any attempt to call printf
again will deadlock.  You lose.  This situation may occur if you call
_any_ library code you didn't write, so the conclusion is that calling
someone else's code from a signal handler is a very bad idea, unless
you're guaranteed the code is fully re-entrant.  Section 10.6 of
Advanced Programming in the UNIX Environment (W. Richard Stevens)
lists the POSIX1 mandated re-entrant functions.  Printf, malloc etc
are not in the list, unsuprisingly.

On Monday, 8 May 2000, Michael Natterer wrote:

> We should probably ignore SIGPIPE totally and let a more sophisticated
> SIGCHLD handler do the work:
> 
> - record which processes have been started ("gimp_nanny()")
> - on SIGCHLD check if it crashed...
> - ...and somehow clean up the plugin's struct and io channels
>      (and show a proper error message)
> 
> The problem with this may be that we need a periodic function doing the cleanup
> (just like the shell prints it's "bla exited [SIGwhatever]" message before the
> prompt) because we cannot do it from the handler.
> 
> OTOH it should be ok to call this cleanup function from two places:
> 
> 1. whenever read/write from/to a plugin pipe fails.
> 2. in an idle function.
> 
> The "gimp_fork()" and signal handler stuff I proposed during the SIGCHLD
> discussion might so the job but it would be much nicer to fix it to work like
> before

SIGPIPE indicates a write(2) into the pipe when the plugin has died.
If SIGPIPE is being ignored (or the signal handler returns) then the
write(2) returns EPIPE.  If you read(2) from a pipe when the plugin
has died, you get back 0 bytes (ie the usual EOF condition).

Thus, we don't need complex gimp_nanny() schemes.  We should SIG_IGN
SIGPIPE.  If a write to a plugin causes EPIPE, then we know the plugin
is dead, and we can call some mourn() function to cleanup after it.
If we read from a plugin and get an EOF when we weren't expecting it,
we can similarly mourn().  We know the plugin concerned, since the
upper levels of the plugin code know which plugin the message was to
have been read from/send to.

There: no need to wrap all forks, or do hideously complex things in a
special SIGCHLD handler.  Easy.

BTW, SIGCLD does _not_ have the same semantics as SIGCHLD (note the
'H').  See the Stevens book for the (gorey) details.

Austin