Search Postgresql Archives

Slave promotion problem...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Last week we had some problems on the master server which caused a failover on the slave (the master was completely unresponsive due to reasons still unknown). The slave received the promote signal (pg_ctl promote) and logged that in the logs: 2015-08-28 23:05:10 UTC [6]: [50-1] user=,db= LOG: received promote request 2015-08-28 23:05:10 UTC [467]: [2-1] user=,db= FATAL: terminating walreceiver process due to administrator command

5 hours later the slave still didn't promote. Meanwhile we fixed the master and restarted it. The slave was restarted and it behaved just like the promote signal didn't arrive, connecting to the master as a regular slave.

Because of maintenance we had to issue a failover a few days after, and this time the failover was successful: 2015-08-30 19:40:08 UTC [312]: [2-1] user=,db= LOG: replication terminated by primary server 2015-08-30 19:40:08 UTC [312]: [3-1] user=,db= DETAIL: End of WAL reached on timeline 3 at 1AC/4D000090. 2015-08-30 19:40:08 UTC [312]: [4-1] user=,db= FATAL: could not send end-of-streaming message to primary: no COPY in progress 2015-08-30 19:40:08 UTC [6]: [34-1] user=,db= LOG: invalid record length at 1AC/4D000090 2015-08-30 19:40:10 UTC [6]: [35-1] user=,db= LOG: received promote request 2015-08-30 19:40:13 UTC [6]: [36-1] user=,db= LOG: redo done at 1AC/4D000028 2015-08-30 19:40:13 UTC [6]: [37-1] user=,db= LOG: last completed transaction was at log time 2015-08-30 19:40:07.18114+00 2015-08-30 19:40:14 UTC [6]: [38-1] user=,db= LOG: selected new timeline ID: 4 2015-08-30 19:40:14 UTC [6]: [39-1] user=,db= LOG: restored log file "00000003.history" from archive 2015-08-30 19:40:14 UTC [6]: [40-1] user=,db= LOG: archive recovery complete 2015-08-30 19:40:14 UTC [6]: [41-1] user=,db= LOG: MultiXact member wraparound protections are now enabled 2015-08-30 19:40:14 UTC [29303]: [1-1] user=,db= LOG: autovacuum launcher started 2015-08-30 19:40:14 UTC [1]: [4-1] user=,db= LOG: database system is ready to accept connections

I am unsure if this promotion failure is a bug/glitch, but the promote procedure is automated and tested a couple of hundred times so I am certain we initiated the promote correctly. Looking in the internet I haven't found anything similar. Does anybody know any reason why the slave didn't promote after receiving the promote signal? Looking at the data it seems like the slave aborted the promote process.

Both instances are 9.4.4 connected with streaming replication.

Regards,
Mladen Marinović


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux