Re: Streaming replication, master recycling

Sameer Kumar <sameer.kumar@xxxxxxxxxx> · Wed, 11 May 2016 11:36:49 +0000

On Wed, May 11, 2016 at 4:35 PM <fredrik@xxxxxxxxxxxxx> wrote:
I apologise for the missing data. 

we are running 9.1.15 on debian servers. 

I think there was a patch in v9.3 which makes sure that if the master has been shutdown properly (smart or fast mode), it will ensure that pending wals are replicated before it shutdown. Also, the timeline switch are written in WAL files since v9.3

So I don't see a reason why a proper switchover with fast shutdown of master and promotion of standby will cause troubles with v9.3 or greater.

Ofcourse I can be wrong (and naive!) and this does not apply for your case.

when we promote the old slave, it seems to go fine. Are you saying that it will cause issues down the line if the previous master is not shut down before promoting? 

You might want to share your recovery.conf on standby node and the recovery.conf which you add on the lost node (old master) while adding it as a standby.

I was actually more concerned with the fact that we (some times) recycle the old master without doing a full basebackup. 

I have done with with v9.2 and v9.3 and seems to be working fine. As long as you have not missed any transactions from master (controlled switchover). In case you are in a situation where master went down before it could replicate the last committed transaction, I don't think lost node (old master) will be able to join the new timeline of standby so your replication would not work (even though the node has been started up).

Again, this seems to work, but this presentation seems to indicate that this can cause problems (while seeming to work): http://hlinnaka.iki.fi/presentations/NordicPGDay2015-pg_rewind.pdf

The note is on page 14, under the headline: "Naive approach".

thank you for your support,
Fredrik

On 11 May 2016 at 12:47:13 +02:00, Venkata Balaji N <nag1010@xxxxxxxxx> wrote:

On Wed, May 11, 2016 at 2:31 PM,  <fredrik@xxxxxxxxxxxxx> wrote:
Hi All,

we are currently using streaming replication on multiple node pairs. We are seeing some issues, but I am mainly interrested in clarification. 

When a failover occurs, we touch the trigger file, promoting the previous slave to master. That works perfectly. 

For recycling the previous master, we create a recovery.conf (with recovery_target_timeline = 'latest') and *try* to start up. If postgresql starts up, we accept it as a new slave. If it does not, we proceed with a full basebackup.

Which version of postgresql you are using ?

You need to shutdown master first, then promote slave and then other way round, but, this can be clarified only if you let us know the postgresql version. This is quite tricky in 9.2.x and from 9.3.x.

Regards,
Venkata B N

Fujitsu Australia

-- 
--
Best Regards
Sameer Kumar | DB Solution Architect 
ASHNIK PTE. LTD.
101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com