Re: Slave promotion problem...

Martín Marqués <martin@xxxxxxxxxxxxxxx> · Mon, 31 Aug 2015 09:38:26 -0300

El 31/08/15 a las 03:29, marin@xxxxxxxx escribió:
> Last week we had some problems on the master server which caused a
> failover on the slave (the master was completely unresponsive due to
> reasons still unknown). The slave received the promote signal (pg_ctl
> promote) and logged that in the logs:
> 2015-08-28 23:05:10 UTC [6]: [50-1] user=,db= LOG:  received promote
> request
> 2015-08-28 23:05:10 UTC [467]: [2-1] user=,db= FATAL:  terminating
> walreceiver process due to administrator command
>
> 5 hours later the slave still didn't promote. Meanwhile we fixed the
> master and restarted it. The slave was restarted and it behaved just
> like the promote signal didn't arrive, connecting to the master as a
> regular slave.

Aren't there any further logs after the walreceiver termination?

Up to here everything looks fine, but we have no idea on what was logged
afterwards.

> I am unsure if this promotion failure is a bug/glitch, but the promote
> procedure is automated and tested a couple of hundred times so I am
> certain we initiated the promote correctly.

Are you using homemade scripts? Maybe you need to test them more
thoroughly, with different environment parameters.

Regards,

-- 
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general