RE: Unable to start replica after failover

"Lahnov, Igor" <Igor.Lahnov@xxxxxxxxxx> · Tue, 23 Aug 2022 12:31:59 +0000

We know what the problem is, but don't know how to solve it correctly.

After failover, *new leader* promoted and read local partial log to LSN 54E/FB348118

->
2022-05-23 01:47:52.124 [12088] LOG: record with incorrect prev-link 0/100 at 54E/FB348118
2022-05-23 01:47:52.124 [12088] LOG: redo done at 54E/FB3480F0 system usage: CPU: user: 321.21 s, system: 144.77 s, elapsed: 354977.44 s
rmgr: XLOG len (rec/tot): 42/ 42, tx: 0, lsn: 54E/FB348118, prev 54E/FB3480F0, desc: END_OF_RECOVERY tli 3; prev tli 2; time 2022-05-23 01:47:53.342100 
2022-05-23 01:47:52.900 [12088] LOG: selected new timeline ID: 3
2022-05-23 01:47:53.658 [12084] LOG: database system is ready to accept connections
<-

The *previous leader* started after a failure and read the local partial log to LSN 54E/FB348150

->
2022-05-23 01:50:12.945 [122891] LOG: redo starts at 54E/F97F4878
2022-05-23 01:50:13.010 [122891] LOG: invalid record length at 54E/FB348150: wanted 24, got 0
2022-05-23 01:50:13.011 [122891] LOG: consistent recovery state reached at 54E/FB348150
2022-05-23 01:50:13.012 [122883] LOG: database system is ready to accept read-only connections
<-

Our 'restore_command' on *previous leader* restores a partial file from archive (from *new leader*)

->
2022-05-23 01:50:14 [123730]: [1-1]: INFO: pg_probackup archive-get WAL file: 000000020000054E000000FB, remote: ssh, threads: 1/1, batch: 1
<-

And this file has a higher priority than the original local partial file on the * previous leader*. And this leads to the problem:

---
2022-05-23 01:50:14.448 [122891] LOG: new timeline 3 forked off current database system timeline 2 before current recovery point 54E/FB348150
---

And pg_rewind tries to use this file (from *new leader*) to build a map of pages that have changed since the last checkpoint

---
2022-05-23 01:51:32,202 INFO: Lock owner: pg51; I am pg01
2022-05-23 01:51:32,209 INFO: Local timeline=2 lsn=54E/FB348150
2022-05-23 01:51:32,252 INFO: master_timeline=3
2022-05-23 01:51:32,254 INFO: master: history=1 53E/FD65D298 no recovery target specified
2 54E/FB348118 no recovery target specified
2022-05-23 01:51:32,499 INFO: running pg_rewind from pg51
2022-05-23 01:51:32,574 INFO: running pg_rewind from dbname=postgres user=postgres host= IP port=5432 target_session_attrs=read-write
2022-05-23 01:51:32,640 INFO: pg_rewind exit code=1
2022-05-23 01:51:32,640 INFO: stdout=
2022-05-23 01:51:32,640 INFO: stderr=pg_rewind: servers diverged at WAL location 54E/FB348118 on timeline 2
pg_rewind: fatal: could not find previous WAL record at 54E/FB348118: unexpected pageaddr 54E/7B34A000 in log segment 000000020000054E000000FB, offset 3448832
---

We checked this log (from *new leader*), and when direct reading, the output like this:
---
record with incorrect prev-link 0/100 at 54E/FB348118
----
when backward reading:
---
54E/FB348118: unexpected pageaddr 54E/7B34A000 in log segment 000000020000054E000000FB, offset 3448832
---

Now we have disabled restore_command, but I think it's not best solution. What the correct solution in this case?