It appears that your missing is following in the recovery.conf file. It instructs the replica to follow a new timeline when presented; without it will refuse to follow the promoted node. recovery_target_timeline=latest Are you using sync replication? > On Jan 10, 2018, at 2:06 PM, Pritam Barhate <pritambarhate@xxxxxxxxx> wrote: > > Hi, > > I am trying to test the failover scenario on a brand new 3 node setup. > > pg1 is master > pg2 and pg3 are standby hosts in streaming replication, both following pg1. > > When I halt pg1, promote pg2 and update recovery.conf on pg3 to point to pg2, the replication is broken. Can you please help me identify the correct procedure to achieve this failover? > > Here are the details: > > Software Versions: > > PostgreSQL: 9.6 > Ubuntu 16.04 > PgBackRest 1.25 > > Here is the scenario: > > Set up 3 nodes in streaming replication: > > master (pg1), standby 1 (pg2), standby 2 (pg3) > > Also, I have set up pgbackrest, to back up from standby. It's working correctly. > > Run the following on pg1: > > sudo -u postgres psql -x -c "select * from pg_stat_replication;" > > -[ RECORD 1 ]----+------------------------------ > pid | 20714 > usesysid | 16384 > usename | replicator > application_name | walreceiver > client_addr | 10.0.15.11 > client_hostname | > client_port | 48640 > backend_start | 2018-01-10 18:17:46.105694+00 > backend_xmin | > state | streaming > sent_location | 0/6000060 > write_location | 0/6000060 > flush_location | 0/6000060 > replay_location | 0/6000060 > sync_priority | 0 > sync_state | async > -[ RECORD 2 ]----+------------------------------ > pid | 20715 > usesysid | 16384 > usename | replicator > application_name | walreceiver > client_addr | 10.0.15.12 > client_hostname | > client_port | 42198 > backend_start | 2018-01-10 18:17:48.712575+00 > backend_xmin | > state | streaming > sent_location | 0/6000060 > write_location | 0/6000060 > flush_location | 0/6000060 > replay_location | 0/6000060 > sync_priority | 0 > sync_state | async > > > It reports both the standbys working correctly. > > Halt pg1. > > On pg2: > > sudo -u postgres pg_ctlcluster 9.6 main promote > > On pg3: > > sudo service postgresql stop > > sudo -u postgres vim /var/lib/postgresql/9.6/main/recovery.conf > > modify it's contents are as follows: > > ``` > primary_conninfo = 'host=pg2 port=5432 user=replicator' > standby_mode = 'on' > restore_command = '/usr/bin/pgbackrest --stanza=main archive-get %f "%p"' > ``` > > sudo service postgresql start > > On pg2: > > sudo -u postgres psql -x -c "select * from pg_stat_replication;" > > -[ RECORD 1 ]----+------------------------------ > pid | 21238 > usesysid | 16384 > usename | replicator > application_name | walreceiver > client_addr | 10.0.15.12 > client_hostname | > client_port | 44852 > backend_start | 2018-01-10 18:22:20.980701+00 > backend_xmin | > state | startup > sent_location | > write_location | > flush_location | > replay_location | > sync_priority | 0 > sync_state | async > > As you can see sent_location, write_location, flush_location all are empty. > > On pg3: > > tail -f /var/log/postgresql/postgresql-9.6-main.log > 2018-01-10 18:22:19.490 UTC [21229] LOG: restored log file "000000010000000000000005" from archive > 2018-01-10 18:22:19.497 UTC [21229] LOG: redo starts at 0/5000028 > 2018-01-10 18:22:19.757 UTC [21243] postgres@postgres FATAL: the database system is starting up > 2018-01-10 18:22:20.188 UTC [21229] LOG: restored log file "000000010000000000000006" from archive > 2018-01-10 18:22:20.195 UTC [21229] LOG: consistent recovery state reached at 0/7000000 > 2018-01-10 18:22:20.195 UTC [21228] LOG: database system is ready to accept read only connections > 2018-01-10 18:22:20.812 UTC [21229] LOG: invalid magic number 0000 in log segment 000000010000000000000007, offset 0 > 2018-01-10 18:22:20.824 UTC [21255] LOG: fetching timeline history file for timeline 2 from primary server > 2018-01-10 18:22:20.828 UTC [21255] LOG: primary server contains no more WAL on requested timeline 1 > 2018-01-10 18:22:21.417 UTC [21255] LOG: primary server contains no more WAL on requested timeline 1 > > > At this point is I insert data in pg2, it doesn't reflect in pg3. > > Please help. > > Regards, > > Pritam. >