Search Postgresql Archives

Re: psql: FATAL: the database system is starting up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/27/19 9:59 PM, Tom K wrote:
Hey Guy's,


I'm running Patroni w/ PostgreSQL 10, ETCD, Haproxy and Keepalived on 3 RHEL 7.6 VM's.  Every now and then the underlying storage crashes taking out the cluster.  On recovery, PostgreSQL tends to come up while other databases just blow up.  That is until now.

The above is one master and two standby servers connected via streaming replication, correct?

Is there WAL archiving set up?



After the most recent crash 2-3 weeks ago, the cluster is now running into this message but I'm not able to make heads or tails out of why it's throwing this:

So you have not been able to run the cluster the past 2-3 weeks or is that more recent?

When you refer to history files below are you talking about WAL files or something else?

Is this:

"recovery command file "recovery.conf" specified neither primary_conninfo nor restore_command"

true?




|-bash-4.2$/usr/pgsql-10/bin/postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf2019-05-2308:26:34.561EDT [10101]LOG:listening onIPv4 address "192.168.0.108",port 54322019-05-2308:26:34.565EDT [10101]LOG:listening onUnix socket "./.s.PGSQL.5432"2019-05-2308:26:34.591EDT [10101]LOG:redirecting log output tologging collector process 2019-05-2308:26:34.591EDT [10101]HINT:Future log output will appear indirectory "log".^C-bash-4.2/usr/pgsql-10/bin/postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --hot_standby=on --listen_addresses=192.168.0.108 --max_worker_processes=8 --max_locks_per_transaction=64 --wal_level=replica --cluster_name=postgres --wal_log_hints=on --max_wal_senders=10 --track_commit_timestamp=off --max_prepared_transactions=0 --port=5432 --max_replication_slots=10 --max_connections=1002019-05-2308:26:45.707EDT [10105]LOG:listening onIPv4 address "192.168.0.108",port 54322019-05-2308:26:45.715EDT [10105]LOG:listening onUnix socket "./.s.PGSQL.5432"2019-05-2308:26:45.740EDT [10105]LOG:redirecting log output tologging collector process 2019-05-2308:26:45.740EDT [10105]HINT:Future log output will appear indirectory "log".^C-bash-4.2$/usr/pgsql-10/bin/postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --hot_standby=on --listen_addresses=192.168.0.1 --max_worker_processes=8 --max_locks_per_transaction=64 --wal_level=replica --cluster_name=postgres --wal_log_hints=on --max_wal_senders=10 --track_commit_timestamp=off --max_prepared_transactions=0 --port=5432 --max_replication_slots=10 --max_connections=1002019-05-2308:29:36.014EDT [10188]LOG:listening onIPv4 address "192.168.0.108",port 54322019-05-2308:29:36.017EDT [10188]LOG:listening onUnix socket "./.s.PGSQL.5432"2019-05-2308:29:36.040EDT [10188]LOG:redirecting log output tologging collector process 2019-05-2308:29:36.040EDT [10188]HINT:Future log output will appear indirectory "log".|

The database logs only print out this:

|2019-05-2308:25:44.476EDT [10088]HINT:The databaseserver will regularly poll the pg_wal subdirectory tocheckforfiles placed there.2019-05-2308:25:44.477EDT [10088]LOG:entering standby mode 2019-05-2308:25:50.209EDT [10086]LOG:received fast shutdownrequest 2019-05-2308:25:50.215EDT [10089]LOG:shutting down 2019-05-2308:25:50.221EDT [10086]LOG:databasesystem isshut down 2019-05-2308:26:11.560EDT [10094]LOG:databasesystem was shut down inrecovery at 2019-05-2308:25:50EDT 2019-05-2308:26:11.560EDT [10094]WARNING:recovery command file"recovery.conf"specified neither primary_conninfo nor restore_command 2019-05-2308:26:11.560EDT [10094]HINT:The databaseserver will regularly poll the pg_wal subdirectory tocheckforfiles placed there.2019-05-2308:26:11.562EDT [10094]LOG:entering standby mode 2019-05-2308:26:17.817EDT [10092]LOG:received fast shutdownrequest 2019-05-2308:26:17.824EDT [10095]LOG:shutting down 2019-05-2308:26:17.831EDT [10092]LOG:databasesystem isshut down 2019-05-2308:26:34.596EDT [10103]LOG:databasesystem was shut down inrecovery at 2019-05-2308:26:17EDT 2019-05-2308:26:34.596EDT [10103]WARNING:recovery command file"recovery.conf"specified neither primary_conninfo nor restore_command 2019-05-2308:26:34.596EDT [10103]HINT:The databaseserver will regularly poll the pg_wal subdirectory tocheckforfiles placed there.2019-05-2308:26:34.598EDT [10103]LOG:entering standby mode 2019-05-2308:26:43.025EDT [10101]LOG:received fast shutdownrequest 2019-05-2308:26:43.032EDT [10104]LOG:shutting down 2019-05-2308:26:43.038EDT [10101]LOG:databasesystem isshut down 2019-05-2308:26:45.748EDT [10107]LOG:databasesystem was shut down inrecovery at 2019-05-2308:26:43EDT 2019-05-2308:26:45.748EDT [10107]WARNING:recovery command file"recovery.conf"specified neither primary_conninfo nor restore_command 2019-05-2308:26:45.748EDT [10107]HINT:The databaseserver will regularly poll the pg_wal subdirectory tocheckforfiles placed there.|

and when connecting, I get this cryptic message:

|[root@psql01 log]#psql -U postgres -h 192.168.0.108psql:FATAL:the databasesystem isstarting up [root@psql01 log]#|

There's no space issue here.  The cluster stays in recovery. This state doesn't change and the DB is inaccessible.

So I ran the above in debug mode ( -d 5 ) and got this:


|DEBUG:mmap(148897792)withMAP_HUGETLB failed,huge pages disabled:Cannot allocate memory|

so I freed memory on the underlying physical this VM sits on but no luck. Same memory error and same result as above.

When patroni is started on just this one node (troubleshooting purposes) I get the message:

|[root@psql01 ~]#ps -ef|grep -Ei "patroni|postgres"root 22172188000:38pts/100:00:00tail -f postgresql-Thu.log postgres 25121400:42?00:00:01/usr/bin/python2 /bin/patroni /etc/patroni.yml postgres 25331000:42?00:00:00/usr/pgsql-10/bin/postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --hot_standby=on --listen_addresses=192.168.0.108 --max_worker_processes=8 --max_locks_per_transaction=64 --wal_level=replica --cluster_name=postgres --wal_log_hints=on --max_wal_senders=10 --track_commit_timestamp=off --max_prepared_transactions=0 --port=5432 --max_replication_slots=10 --max_connections=100postgres 25352533000:42?00:00:00postgres:postgres:logger process postgres 25362533000:42?00:00:00postgres:postgres:startup process waiting for000000010000000000000008root 26642039000:42pts/000:00:00grep --color=auto -Ei patroni|postgres[root@psql01 ~]#|

The system prints "postgres: postgres: startup process waiting for 000000010000000000000008"

Seems it's trying to replay the history log but can't? Running an strace on the file reveals that it's looking for the following:

|read(5,0x7fff9cb4eb87,1)=-1EAGAIN (Resource temporarily unavailable)read(5,0x7fff9cb4eb87,1)=-1EAGAIN (Resource temporarily unavailable)open("pg_wal/00000098.history",O_RDONLY)=-1ENOENT (No such fileordirectory)epoll_create1(EPOLL_CLOEXEC)=3|

But this file doesn't exist in the directory on this particular cluster node (psql01).  I copy the missing history files off another cluster node to this one and it does move forward but just get's stuck on another history file.  But same error message as above on psql01.

So I give up and try to recover the secondary node (psql02).  But this yields this error:

|PANIC:replicationcheckpointhas wrong magic 0instead of307747550|

and I'm not able to find very much to move me forward on this one either.

So again I give up after a few hours of trying to recover psql02 and move on to the third node.  On the third cluster node (pgsql03) I get this error which I'm also not able to find much info on:

|FATAL:syntax error inhistory file:f2W|

Hoping I'm able to recover the data from at least one of the nodes, make it primary, and get it to replicate to the others.

Thx, TK



--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux