Re: psql: FATAL: the database system is starting up

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Wed, 29 May 2019 07:28:24 -0700

On 5/28/19 6:59 PM, Tom K wrote:

On Tue, May 28, 2019 at 9:53 AM Adrian Klaver <adrian.klaver@xxxxxxxxxxx 
<mailto:adrian.klaver@xxxxxxxxxxx>> wrote:

Correct.  Master election occurs through Patroni.  WAL level is set to:

wal_level = 'replica'

So no archiving.

     >
     > After the most recent crash 2-3 weeks ago, the cluster is now
    running
     > into this message but I'm not able to make heads or tails out of why
     > it's throwing this:

    So you have not been able to run the cluster the past 2-3 weeks or is
    that  more recent?

Haven't been able to bring this PostgresSQL cluster up ( run the cluster 
) since 2-3 weeks ago.  Tried quite a few combinations of options to 
recover this.  No luck.  Had storage failures earlier, even with 
corrupted OS files, but this PostgreSQL cluster w/ Patroni was able to 
come up each time without any recovery effort on my part.

    When you refer to history files below are you talking about WAL
    files or
    something else?

    Is this:

    "recovery command file "recovery.conf" specified neither
    primary_conninfo nor restore_command"

    true?

True. recovery.conf is controlled by Patroni.  Contents of this file 
remained the same for all the cluster nodes with the exception of the 
primary_slot_name:

[root@psql01 postgresql-patroni-etcd]# cat recovery.conf
primary_slot_name = 'postgresql0'
standby_mode = 'on'
recovery_target_timeline = 'latest'
[root@psql01 postgresql-patroni-etcd]#

[root@psql02 postgres-backup]# cat recovery.conf
primary_slot_name = 'postgresql1'
standby_mode = 'on'
recovery_target_timeline = 'latest'
[root@psql02 postgres-backup]#

[root@psql03 postgresql-patroni-backup]# cat recovery.conf
primary_slot_name = 'postgresql2'
standby_mode = 'on'
recovery_target_timeline = 'latest'
[root@psql03 postgresql-patroni-backup]#

I've made a copy of the root postgres directory over to another location 
so when troubleshooting, I can always revert to the first state the 
cluster was in when it failed.

I have no experience with Patroni so I will be of no help there. You 
might get more useful information from:

https://github.com/zalando/patroni
Community

There are two places to connect with the Patroni community: on github, 
via Issues and PRs, and on channel #patroni in the PostgreSQL Slack. If 
you're using Patroni, or just interested, please join us.

That being said, can you start the copied Postgres instance without 
using the Patroni instrumentation?

Thx,
TK

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx