Sherrylyn Branchaw <sbranchaw@xxxxxxxxx> writes: > We are using Postgres 9.6.8 (planning to upgrade to 9.6.9 soon) on RHEL 6.9. > We recently experienced two similar outages on two different prod > databases. The error messages from the logs were as follows: > LOG: server process (PID 138529) was terminated by signal 6: Aborted Hm ... were these installations built with --enable-cassert? If not, an abort trap seems pretty odd. > In one case, the logs recorded > LOG: all server processes terminated; reinitializing > LOG: incomplete data in "postmaster.pid": found only 1 newlines while > trying to add line 7 > ... > In the other case, the logs recorded > LOG: all server processes terminated; reinitializing > LOG: dynamic shared memory control segment is corrupt > LOG: incomplete data in "postmaster.pid": found only 1 newlines while > trying to add line 7 > ... Those "incomplete data" messages are quite unexpected and disturbing. I don't know of any mechanism within Postgres proper that would result in corruption of the postmaster.pid file that way. (I wondered briefly if trying to start a conflicting postmaster would result in such a situation, but experimentation here says not.) I'm suspicious that this may indicate a bug or unwarranted assumption in whatever scripts you use to start/stop the postmaster. Whether that is at all related to your crash issue is hard to say, but it bears looking into. > My question is whether the corrupt shared memory control segment, and the > failure of Postgres to automatically restart, mean the database should not > be automatically started up, and if there's something we should be doing > before restarting. No, that looks like fairly typical crash recovery to me: corrupt shared memory contents are expected and recovered from after a crash. However, we don't expect postmaster.pid to get mucked with. regards, tom lane