Re: What to do when dynamic shared memory control segment is corrupt

Tom Lane <tgl@xxxxxxxxxxxxx> · Mon, 18 Jun 2018 12:30:13 -0400

Sherrylyn Branchaw <sbranchaw@xxxxxxxxx> writes:
> We are using Postgres 9.6.8 (planning to upgrade to 9.6.9 soon) on RHEL 6.9.
> We recently experienced two similar outages on two different prod
> databases. The error messages from the logs were as follows:
> LOG:  server process (PID 138529) was terminated by signal 6: Aborted

Hm ... were these installations built with --enable-cassert?  If not,
an abort trap seems pretty odd.

> In one case, the logs recorded
> LOG:  all server processes terminated; reinitializing
> LOG:  incomplete data in "postmaster.pid": found only 1 newlines while
> trying to add line 7
> ...

> In the other case, the logs recorded
> LOG:  all server processes terminated; reinitializing
> LOG:  dynamic shared memory control segment is corrupt
> LOG:  incomplete data in "postmaster.pid": found only 1 newlines while
> trying to add line 7
> ...

Those "incomplete data" messages are quite unexpected and disturbing.
I don't know of any mechanism within Postgres proper that would result
in corruption of the postmaster.pid file that way.  (I wondered briefly
if trying to start a conflicting postmaster would result in such a
situation, but experimentation here says not.)  I'm suspicious that
this may indicate a bug or unwarranted assumption in whatever scripts
you use to start/stop the postmaster.  Whether that is at all related
to your crash issue is hard to say, but it bears looking into.

> My question is whether the corrupt shared memory control segment, and the
> failure of Postgres to automatically restart, mean the database should not
> be automatically started up, and if there's something we should be doing
> before restarting.

No, that looks like fairly typical crash recovery to me: corrupt shared
memory contents are expected and recovered from after a crash.  However,
we don't expect postmaster.pid to get mucked with.

			regards, tom lane