2020年8月9日(日) 0:12 Don Seiler <don@xxxxxxxxx>: > > On Sat, Aug 8, 2020 at 9:14 AM Ian Lawrence Barwick <barwick@xxxxxxxxx> wrote: >> >> >> After the failure to fetch the expected WAL file, the standby >> (replica) should be >> attempting to fetch the 00000002.history file; do you see any wal-g >> error messages >> about that? > > > No. I when I first start the new replica, it shows: > > 2020-08-08 06:16:47.536 GMT [30368] LOG: database system was shut down at 2020-08-08 05:29:12 GMT > 2020-08-08 06:16:47.768 GMT [30379] [unknown]@[unknown] - [unknown] LOG: connection received: host=[local] > 2020-08-08 06:16:47.768 GMT [30379] postgres@postgres - [unknown] FATAL: the database system is starting up > 2020-08-08 06:16:47.769 GMT [30380] [unknown]@[unknown] - [unknown] LOG: connection received: host=[local] > 2020-08-08 06:16:47.769 GMT [30380] postgres@banno_login - [unknown] FATAL: the database system is starting up > 2020-08-08 06:16:47.849 GMT [30381] [unknown]@[unknown] - [unknown] LOG: connection received: host=[local] > 2020-08-08 06:16:47.849 GMT [30381] [unknown]@[unknown] - [unknown] LOG: incomplete startup packet > WARNING: 2020/08/08 06:16:47.857808 You are using deprecated functionality that uses an external gpg library. > It will be removed in next major version. > Please set GPG key using environment variables WALG_PGP_KEY or WALG_PGP_KEY_PATH. > ERROR: 2020/08/08 06:16:48.041147 openpgp: invalid argument: no armored data found > 2020-08-08 06:16:48.042 GMT [30368] LOG: entering standby mode > ERROR: 2020/08/08 06:16:48.423220 Archive '000000010000142E0000005B' does not exist. > 2020-08-08 06:16:48.428 GMT [30368] LOG: consistent recovery state reached at 142E/5B000098 > 2020-08-08 06:16:48.428 GMT [30368] LOG: invalid record length at 142E/5B000098: wanted 24, got 0 > 2020-08-08 06:16:48.429 GMT [30366] LOG: database system is ready to accept read only connections > > From there I see these messages repeated: > > ERROR: 2020/08/08 06:16:49.029836 Archive '000000010000142E0000005B' does not exist. > WARNING: 2020/08/08 06:16:49.510460 You are using deprecated functionality that uses an external gpg library. > It will be removed in next major version. > Please set GPG key using environment variables WALG_PGP_KEY or WALG_PGP_KEY_PATH. > ERROR: 2020/08/08 06:16:49.513288 openpgp: invalid argument: no armored data found > > There's no attempt to look for 00000002.history that I would normally expect when a replica doing WAL restore/recovery runs of out of files to restore. In that case are you a) absolutely sure recovery.conf contains "recovery_target_timeline = 'latest'"? b) the standby was restarted *after* the change was made? I ask because not attempting to fetch a history file would be a sign that "recovery_target_timeline" is not set (or set but not to "latest"). (...) > There are WAL files in that new replica from when it was still the old primary, would those be causing confusion?: > > postgres:~/9.6/main/pg_xlog$ ls -lh > total 273M > -rw------- 1 postgres postgres 340 Aug 7 22:21 000000010000142C000000AD.00000028.backup > -rw------- 1 postgres postgres 16M Aug 8 05:24 000000010000142E00000055 > -rw------- 1 postgres postgres 16M Aug 8 05:25 000000010000142E00000056 > -rw------- 1 postgres postgres 16M Aug 8 05:26 000000010000142E00000057 > -rw------- 1 postgres postgres 16M Aug 8 05:27 000000010000142E00000058 > -rw------- 1 postgres postgres 16M Aug 8 05:28 000000010000142E00000059 > -rw------- 1 postgres postgres 16M Aug 8 05:29 000000010000142E0000005A > -rw------- 1 postgres postgres 16M Aug 8 05:29 000000010000142E0000005B > -rw------- 1 postgres postgres 16M Aug 8 05:16 000000010000142E0000005C > -rw------- 1 postgres postgres 16M Aug 8 05:15 000000010000142E0000005D > -rw------- 1 postgres postgres 16M Aug 8 05:18 000000010000142E0000005E > -rw------- 1 postgres postgres 16M Aug 8 05:14 000000010000142E0000005F > -rw------- 1 postgres postgres 16M Aug 8 05:17 000000010000142E00000060 > -rw------- 1 postgres postgres 16M Aug 8 05:19 000000010000142E00000061 > -rw------- 1 postgres postgres 16M Aug 8 05:22 000000010000142E00000062 > -rw------- 1 postgres postgres 16M Aug 8 05:21 000000010000142E00000063 > -rw------- 1 postgres postgres 16M Aug 8 05:20 000000010000142E00000064 > -rw------- 1 postgres postgres 16M Aug 8 05:23 000000010000142E00000065 >From the timestamps, 000000010000142E0000005C and later are files which were "recycled" for future use and as such are perfectly normal. Regards Ian Barwick -- Ian Barwick https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services