On Wed, 2018-10-10 at 11:24 +0800, magodo wrote: > Hello, > > I'm figuring out how to do PITR on a HA setup for postgres 9.6. I > almost manage to do it, while I faced following issue: > > > [1. create basebackup] > > [2. insert 2 rows] > > on both primary and standby's pg_xlog > > 000000010000000000000005 > > rmgr: Transaction len (rec/tot): 34/ 34, tx: 633, lsn: > 0/05016120, prev 0/050160E0, desc: COMMIT 2018-10-10 03:04:58.459754 > UTC - insert 1 (before: 2018-10-10 11:05:04 +08:00) > > rmgr: Transaction len (rec/tot): 34/ 34, tx: 634, lsn: > 0/05016188, prev 0/05016148, desc: COMMIT 2018-10-10 03:05:07.010478 > UTC - insert 2 (before: 2018-10-10 11:05:09 +08:00) > > [3. failover and promote the standby] > > 000000010000000000000005 been archived as > 000000010000000000000005.partial, and a new wal is switched out > named: > 000000020000000000000005, which contains above two commits. > > [4. recover to "after insert 1st row"] > > Stop new standby(the rewinded primary), and on the new > primary(promoted > standby), do following: > > 1. stop DB > 2. sync $PGDATA against basebackup > 3. create a recovery.conf with following content: > > restore_command = 'cp /var/lib/pgsql/9.6/data/archive/%f %p' > recovery_target_timeline = 2 > recovery_target_time = '2018-10-10 11:05:04 +08:00' > > 4. stat DB > > Then I expect server is recovered to the time before insert 2nd row > but > after insert 1st row. The result is as I expected, but it is in a > "read-only transaction". The log shows following: > > < 2018-10-10 03:09:56.790 UTC > LOG: database system was > interrupted; > last known up at 2018-10-10 02:57:59 UTC > < 2018-10-10 03:09:56.931 UTC > DEBUG: restore_command = 'cp > /var/lib/pgsql/9.6/data/archive/%f %p' > < 2018-10-10 03:09:56.931 UTC > DEBUG: recovery_target_timeline = 2 > < 2018-10-10 03:09:56.931 UTC > DEBUG: recovery_target_time = '2018- > 10-10 03:05:04+00' > < 2018-10-10 03:09:56.934 UTC > LOG: restored log file > "00000002.history" from archive > < 2018-10-10 03:09:56.934 UTC > LOG: starting point-in-time recovery > to 2018-10-10 03:05:04+00 > < 2018-10-10 03:09:56.936 UTC > LOG: restored log file > "00000002.history" from archive > cp: cannot stat > '/var/lib/pgsql/9.6/data/archive/000000020000000000000004': No such > file or directory > < 2018-10-10 03:09:56.941 UTC > DEBUG: could not restore file > "000000020000000000000004" from archive: child process exited with > exit > code > 1 > > > < 2018-10-10 03:09:56.952 UTC > LOG: restored log file > "000000010000000000000004" from archive > < 2018-10-10 03:09:57.012 UTC > DEBUG: got WAL segment from archive > < 2018-10-10 03:09:57.013 UTC > DEBUG: checkpoint record is at > 0/4000060 > < 2018-10-10 03:09:57.013 UTC > DEBUG: redo record is at 0/4000028; > shutdown FALSE > < 2018-10-10 03:09:57.013 UTC > DEBUG: next transaction ID: 0:632; > next OID: 16385 > < 2018-10-10 03:09:57.013 UTC > DEBUG: next MultiXactId: 1; next > MultiXactOffset: 0 > < 2018-10-10 03:09:57.013 UTC > DEBUG: oldest unfrozen transaction > ID: > 623, in database 1 > < 2018-10-10 03:09:57.013 UTC > DEBUG: oldest MultiXactId: 1, in > database 1 > < 2018-10-10 03:09:57.013 UTC > DEBUG: commit timestamp Xid > oldest/newest: 0/0 > < 2018-10-10 03:09:57.013 UTC > DEBUG: transaction ID wrap limit is > 2147484270, limited by database with OID 1 > < 2018-10-10 03:09:57.013 UTC > DEBUG: MultiXactId wrap limit is > 2147483648, limited by database with OID 1 > < 2018-10-10 03:09:57.013 UTC > DEBUG: starting up replication slots > < 2018-10-10 03:09:57.013 UTC > DEBUG: restoring replication slot > from > "pg_replslot/repl_slot/state" > < 2018-10-10 03:09:57.015 UTC > DEBUG: starting up replication > origin > progress state > < 2018-10-10 03:09:57.020 UTC > DEBUG: resetting unlogged relations: > cleanup 1 init 0 > < 2018-10-10 03:09:57.021 UTC > DEBUG: initializing for hot standby > < 2018-10-10 03:09:57.021 UTC > LOG: redo starts at 0/4000028 > < 2018-10-10 03:09:57.021 UTC > DEBUG: recovery snapshots are now > enabled > < 2018-10-10 03:09:57.021 UTC > CONTEXT: xlog redo at 0/4000028 for > Standby/RUNNING_XACTS: nextXid 632 latestCompletedXid 631 > oldestRunningXid > 632 > > > < 2018-10-10 03:09:57.021 UTC > DEBUG: end of backup reached > < 2018-10-10 03:09:57.021 UTC > CONTEXT: xlog redo at 0/40000D0 for > XLOG/BACKUP_END: 0/4000028 > < 2018-10-10 03:09:57.022 UTC > LOG: consistent recovery state > reached > at 0/40000F8 > < 2018-10-10 03:09:57.023 UTC > DEBUG: checkpointer updated shared > memory configuration values > < 2018-10-10 03:09:57.024 UTC > LOG: database system is ready to > accept read only connections > < 2018-10-10 03:09:57.028 UTC > DEBUG: archived transaction log file > "000000010000000000000004" > < 2018-10-10 03:09:57.035 UTC > DEBUG: archived transaction log file > "00000002.history" > < 2018-10-10 03:09:57.056 UTC > LOG: restored log file > "000000020000000000000005" from archive > < 2018-10-10 03:09:57.116 UTC > DEBUG: got WAL segment from archive > < 2018-10-10 03:09:57.117 UTC > LOG: recovery stopping before commit > of transaction 634, time 2018-10-10 03:05:07.010478+00 > < 2018-10-10 03:09:57.117 UTC > LOG: recovery has paused > < 2018-10-10 03:09:57.117 UTC > HINT: Execute > pg_xlog_replay_resume() > to continue. > < 2018-10-10 03:09:57.119 UTC > DEBUG: archived transaction log file > "000000020000000000000005" > < 2018-10-10 03:09:57.759 UTC > DEBUG: forked new backend, pid=254 > socket=11 > < 2018-10-10 03:09:57.771 UTC > DEBUG: server process (PID 254) > exited > with exit code 0 > > And I have to execute `pg_xlog_replay_resume()` on the new primary so > that it will branch out to new timeline 3 and work normally... > > Would anyone please tell me what's going on here? How can I avoid > doing > a manual resume? > > --- > Magodo I realized that because the basebackup is made from a hot standby, in which the `hot_standby = on` is set in *postgresql.conf*. >From reference manual: > ... This parameter can only be set at server start. It only has effect during archive recovery or in standby mode. So the primary after PITR has become a hot standby... Then I tried a `pg_ctl promote` on it, but nothing happened.. Why? --- Magodo