Re: postmaster deadlock while logging after syslogger exited

Andres Freund <andres@xxxxxxxxxxx> · Thu, 16 Nov 2017 18:54:38 -0800

On 2017-11-16 21:39:49 -0500, Tom Lane wrote:
> > We could work around a situation like that if we made postmaster use a
> > *different* pipe as stderr than the one we're handing to normal
> > backends. If postmaster created a new pipe and closed the read end
> > whenever forking a syslogger, we should get EPIPEs when writing after
> > syslogger died and could fall back to proper stderr or such.
> 
> I think that's nonsense, unfortunately.

Nice phrasing.

> If the postmaster had its own pipe, that would reduce the risk of this
> deadlock because only the postmaster would be filling that pipe, not
> the postmaster and all its other children --- but it wouldn't
> eliminate the risk.

The deadlock happens because postmaster is waiting for syslogger accept
a message, and syslogger waits for postmaster to restart it. To resolve
the deadlock postmasterneeds to not wait for a dead sylogger, even if it
hasn't yet received & processed the SIGCLD - what other postmaster
children do or don't do doesn't matter for resolving that cycle. The
reason postmaster currently block on writing to the pipe, instead of
getting EPIPE, is because both ends of the pipe are still
existing. Which in turn is the case because we need to be able to
restart syslogger without passing a new file descriptor to all
subprocesses.  If postmaster instead uses a different pipe to write to
it'll not block anymore, instead getting EPIPE, and can continue towards
starting a new syslogger.  So I don't think the described deadlock
exists if we were to apply my proposed fix.

What this obviously would not *not* guarantee is being able start a new
syslogger, but it seems fairly impossible to guarantee that. So sure,
other processes would still block until syslogger has successfully
restarted - but it's a resolvable situation rather than a hard deadlock,
which the described situation appears to be.

Note that there's plenty of cases where you could run into this even
without being unable to fork new processes. You'd e.g. could also run
into this while logging the exit of some other subprocess or such,
there's enough ereports in postmaster.

> I doubt the increase in reliability would be enough to justify the
> extra complexity and cost.

I'm doubtful about that too.

> What might be worth thinking about is allowing the syslogger process to
> inherit the postmaster's OOM-kill-proofness setting, instead of dropping
> down to the same vulnerability as the postmaster's other child processes.
> That presumes that this was an otherwise-unjustified OOM kill, which
> I'm not quite sure of ... but it does seem like a situation that could
> arise from time to time.

Hm. I'm a bit scared about that - it doesn't seem that inconceivable
that various backends log humongous multi-line messages, leading to
syslogger *actually* taking up a fair amount of memory. Note that we're
using plain stringinfos that ereport(ERROR) out of memory situations,
rather than failing more gracefully.

- Andres

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general