Re: pg_rewind copy so much data

Hung Phan <hungphan227@xxxxxxxxx> · Wed, 13 Sep 2017 12:21:14 +0700

Hi,
Thanks for your response. I have just replayed switching master and slave once again:

- one master and one slave (total size of each server is more than 4GB). Currently the last log of the slave is "started  streaming WAL from primary at 2/D6000000 on timeline 10".

- stop master, the slave show below logs:
          replication terminated by primary server
          End of WAL reached on timeline 10 at 2/D69304D0
          Invalid record length at 2/D69304D0
          could not connect to primary server

- promote the slave:
          receive promote request
          redo done at 2/D6930460
          selected new timeline ID: 11
          archive recovery complete
          MultiXact member wraparound protections are now enabled
          database system is ready to accept connections
          autovacuum launcher started

- start and stop old master, then run pg_rewind (all are executed immediately after promoting the slave). Logs of pg_rewind:
          servers diverged at WAL position 2/D69304D0 on timeline 10
          rewinding from last common checkpoint at 2/D6930460 on timeline 10
          reading source file list
          reading target file list
          reading WAL in target
          need to copy 4168 MB (total source directory is 4186 MB)
          4268372/4268372 kB (100%) copied
          creating backup label and updating control file
          syncing target data directory
          Done!

If I run pg_rewind with debug option, it just show additional bunch of files copied in directories like base or pg_tblspc. I claim that there is no data inserted of modified from the first step. The only difference between two server is caused by restarting old master.

Thanks and Regards,

Hung Phan

On Wed, Sep 13, 2017 at 10:48 AM, Michael Paquier <michael.paquier@xxxxxxxxx> wrote:
On Wed, Sep 13, 2017 at 12:41 PM, Hung Phan <hungphan227@xxxxxxxxx> wrote:

> I have tested pg_rewind (ver 9.5) with the following scenario:

>

> - one master and one slave (total size of each server is more than 4GB)

> - set wal_log_hint=on and restart both

> - stop master, promote slave

> - start old master again (now two servers have diverged)

> - stop old master, run pg_rewind with progress option

That's a good flow. Don't forget to run a manual checkpoint after

promotion to update the control file of the promoted standby so as

pg_rewind is able to identify the timeline difference between the

source and the target servers.

> The pg_rewind ran successfully but I saw it copied more than 4GB (4265891 kB

> copied). So I wonder there was very minor difference between two servers but

> why did pg_rewind copy almost all data of new master?

Without knowing exactly the list of things that have been registered

as things to copy from the active source to the target, it is hard to

give a conclusion. But my bet here is that you let the target server

online long enough that it had a bunch of block updated, causing more

relation blocks to be copied from the source because more efforts

would be needed to re-sync it. That's only an assumption without data

with clear numbers, numbers that could be found using the --debug

messages of pg_rewind.

--

Michael