pg9.6 when is a promoted cluster ready to accept "rewind" request?

magodo <wztdyl@xxxxxxxx> · Mon, 12 Nov 2018 13:11:23 +0800

Dear supporters,

I'm writing some scripts to implement manual failover. I have two
clusters(let's say p1 and p2), where one is primary(e.g. p1) and the
other is standby(e.g. p2). The way to do manual failover is straight
forward, like following:

1. promote on p2
2. wait `pg_is_ready()` on p2
3. rewind on p1
4. prepare a recovery.conf on p1
5. start p1

This should ends up with the same HA but role switched.

It works find if I manually do each step. 

But if I call each step sequentially in a script, it will fail after I
switched role for the 1st time and want to switch back.

For example, with a fresh setup(timeline starts from 1), I firstly
tried to switch role, and it works. I get p1 as standby following p2,
which is the priamry. Then I switch role again and error occurs, the
error message is like:

   < 2018-11-12 04:59:24.547 UTC > LOG:  entering standby mode
   < 2018-11-12 04:59:24.555 UTC > LOG:  redo starts at 0/4000028
   < 2018-11-12 04:59:24.566 UTC > LOG:  started streaming WAL from
   primary at 0/5000000 on timeline 1
   < 2018-11-12 04:59:24.566 UTC > FATAL:  could not receive data from
   WAL stream: ERROR:  requested WAL segment 000000020000000000000005
   has already been
   removed                                                             

   < 2018-11-12 04:59:24.577 UTC > LOG:  started streaming WAL from
   primary at 0/5000000 on timeline 1
   < 2018-11-12 04:59:24.577 UTC > FATAL:  could not receive data from
   WAL stream: ERROR:  requested WAL segment 000000020000000000000005
   has already been
   removed                                                             

   < 2018-11-12 04:59:25.413 UTC > FATAL:  the database system is
   starting up
   < 2018-11-12 04:59:26.416 UTC > FATAL:  the database system is
   starting up
   < 2018-11-12 04:59:27.419 UTC > FATAL:  the database system is
   starting up
   < 2018-11-12 04:59:28.422 UTC > FATAL:  the database system is
   starting up
   < 2018-11-12 04:59:29.425 UTC > FATAL:  the database system is
   starting up
   < 2018-11-12 04:59:29.576 UTC > LOG:  started streaming WAL from
   primary at 0/5000000 on timeline 1
   < 2018-11-12 04:59:29.576 UTC > FATAL:  could not receive data from
   WAL stream: ERROR:  requested WAL segment 000000020000000000000005
   has already been removed              

the pg_rewind output is as follow:

   servers diverged at WAL position 0/5000060 on timeline 1         
   rewinding from last common checkpoint at 0/4000060 on timeline 1 

>From the log, it seems the wrong timeline of divergence is evaluated,
it should be timeline 2 rather than 1. 

Furthermore, if I add a `sleep` between step 2(promote) and step
3(rewind), it just works. 

Hence, I suspect the promoted cluster is not ready to be used for
rewinding right after promote. Is there anything I need to wait before
I rewind the old primary against this promoted cluster?

Thank you in advance!

---
magodo