Hello, I'm figuring out how to do PITR on a HA setup for postgres 9.6. I almost manage to do it, while I faced following issue: [1. create basebackup] [2. insert 2 rows] on both primary and standby's pg_xlog 000000010000000000000005 rmgr: Transaction len (rec/tot): 34/ 34, tx: 633, lsn: 0/05016120, prev 0/050160E0, desc: COMMIT 2018-10-10 03:04:58.459754 UTC - insert 1 (before: 2018-10-10 11:05:04 +08:00) rmgr: Transaction len (rec/tot): 34/ 34, tx: 634, lsn: 0/05016188, prev 0/05016148, desc: COMMIT 2018-10-10 03:05:07.010478 UTC - insert 2 (before: 2018-10-10 11:05:09 +08:00) [3. failover and promote the standby] 000000010000000000000005 been archived as 000000010000000000000005.partial, and a new wal is switched out named: 000000020000000000000005, which contains above two commits. [4. recover to "after insert 1st row"] Stop new standby(the rewinded primary), and on the new primary(promoted standby), do following: 1. stop DB 2. sync $PGDATA against basebackup 3. create a recovery.conf with following content: restore_command = 'cp /var/lib/pgsql/9.6/data/archive/%f %p' recovery_target_timeline = 2 recovery_target_time = '2018-10-10 11:05:04 +08:00' 4. stat DB Then I expect server is recovered to the time before insert 2nd row but after insert 1st row. The result is as I expected, but it is in a "read-only transaction". The log shows following: < 2018-10-10 03:09:56.790 UTC > LOG: database system was interrupted; last known up at 2018-10-10 02:57:59 UTC < 2018-10-10 03:09:56.931 UTC > DEBUG: restore_command = 'cp /var/lib/pgsql/9.6/data/archive/%f %p' < 2018-10-10 03:09:56.931 UTC > DEBUG: recovery_target_timeline = 2 < 2018-10-10 03:09:56.931 UTC > DEBUG: recovery_target_time = '2018- 10-10 03:05:04+00' < 2018-10-10 03:09:56.934 UTC > LOG: restored log file "00000002.history" from archive < 2018-10-10 03:09:56.934 UTC > LOG: starting point-in-time recovery to 2018-10-10 03:05:04+00 < 2018-10-10 03:09:56.936 UTC > LOG: restored log file "00000002.history" from archive cp: cannot stat '/var/lib/pgsql/9.6/data/archive/000000020000000000000004': No such file or directory < 2018-10-10 03:09:56.941 UTC > DEBUG: could not restore file "000000020000000000000004" from archive: child process exited with exit code 1 < 2018-10-10 03:09:56.952 UTC > LOG: restored log file "000000010000000000000004" from archive < 2018-10-10 03:09:57.012 UTC > DEBUG: got WAL segment from archive < 2018-10-10 03:09:57.013 UTC > DEBUG: checkpoint record is at 0/4000060 < 2018-10-10 03:09:57.013 UTC > DEBUG: redo record is at 0/4000028; shutdown FALSE < 2018-10-10 03:09:57.013 UTC > DEBUG: next transaction ID: 0:632; next OID: 16385 < 2018-10-10 03:09:57.013 UTC > DEBUG: next MultiXactId: 1; next MultiXactOffset: 0 < 2018-10-10 03:09:57.013 UTC > DEBUG: oldest unfrozen transaction ID: 623, in database 1 < 2018-10-10 03:09:57.013 UTC > DEBUG: oldest MultiXactId: 1, in database 1 < 2018-10-10 03:09:57.013 UTC > DEBUG: commit timestamp Xid oldest/newest: 0/0 < 2018-10-10 03:09:57.013 UTC > DEBUG: transaction ID wrap limit is 2147484270, limited by database with OID 1 < 2018-10-10 03:09:57.013 UTC > DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 < 2018-10-10 03:09:57.013 UTC > DEBUG: starting up replication slots < 2018-10-10 03:09:57.013 UTC > DEBUG: restoring replication slot from "pg_replslot/repl_slot/state" < 2018-10-10 03:09:57.015 UTC > DEBUG: starting up replication origin progress state < 2018-10-10 03:09:57.020 UTC > DEBUG: resetting unlogged relations: cleanup 1 init 0 < 2018-10-10 03:09:57.021 UTC > DEBUG: initializing for hot standby < 2018-10-10 03:09:57.021 UTC > LOG: redo starts at 0/4000028 < 2018-10-10 03:09:57.021 UTC > DEBUG: recovery snapshots are now enabled < 2018-10-10 03:09:57.021 UTC > CONTEXT: xlog redo at 0/4000028 for Standby/RUNNING_XACTS: nextXid 632 latestCompletedXid 631 oldestRunningXid 632 < 2018-10-10 03:09:57.021 UTC > DEBUG: end of backup reached < 2018-10-10 03:09:57.021 UTC > CONTEXT: xlog redo at 0/40000D0 for XLOG/BACKUP_END: 0/4000028 < 2018-10-10 03:09:57.022 UTC > LOG: consistent recovery state reached at 0/40000F8 < 2018-10-10 03:09:57.023 UTC > DEBUG: checkpointer updated shared memory configuration values < 2018-10-10 03:09:57.024 UTC > LOG: database system is ready to accept read only connections < 2018-10-10 03:09:57.028 UTC > DEBUG: archived transaction log file "000000010000000000000004" < 2018-10-10 03:09:57.035 UTC > DEBUG: archived transaction log file "00000002.history" < 2018-10-10 03:09:57.056 UTC > LOG: restored log file "000000020000000000000005" from archive < 2018-10-10 03:09:57.116 UTC > DEBUG: got WAL segment from archive < 2018-10-10 03:09:57.117 UTC > LOG: recovery stopping before commit of transaction 634, time 2018-10-10 03:05:07.010478+00 < 2018-10-10 03:09:57.117 UTC > LOG: recovery has paused < 2018-10-10 03:09:57.117 UTC > HINT: Execute pg_xlog_replay_resume() to continue. < 2018-10-10 03:09:57.119 UTC > DEBUG: archived transaction log file "000000020000000000000005" < 2018-10-10 03:09:57.759 UTC > DEBUG: forked new backend, pid=254 socket=11 < 2018-10-10 03:09:57.771 UTC > DEBUG: server process (PID 254) exited with exit code 0 And I have to execute `pg_xlog_replay_resume()` on the new primary so that it will branch out to new timeline 3 and work normally... Would anyone please tell me what's going on here? How can I avoid doing a manual resume? --- Magodo