Using pg_rewind for DR testing / missing wal files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Currently I'm running pgsql 10.4 on a centos7 machine as a primary database.  I have a standby database in GCP with the same OS/pgsql version.

What I'm trying to do is promote the standby every morning around 0600 so that applications can connect and teams can do some testing throughout the day.   When dev teams are done testing around 1800 I want to pg_rewind the standby so it gets caught back up where I can then repeat this process the next day.

Things work okay for the most part but one hitch I'm running into that I can't wrap my head around is this:

1. I promoted the standby today at about 0623.
2. I left it running all day
3. At about 1950 I went to pg_rewind the standby using the following command:
/usr/pgsql-10/bin/pg_rewind -D /var/lib/pgsql/10/data/ --source-server='postgresql://drusr:x@10.20.6.10/postgres' -P

It immediately failed that it couldn't find a wal file:
connected to server
servers diverged at WAL location 22C3/99779318 on timeline 2
could not open file "/var/lib/pgsql/10/data//pg_wal/00000002000022C300000098": No such file or directory

Now, looking at the wal file in my archive log dir on the primary, the timestamp of the file was about 0623.

I scp'd the wal to the standby and reran the pg_rewind command.  This time is couldn't find the file before:
connected to server
servers diverged at WAL location 22C3/99779318 on timeline 2
could not open file "/var/lib/pgsql/10/data//pg_wal/00000002000022C300000097": No such file or directory

I repeated this process for about 8 wal files where the last one "/var/lib/pgsql/10/data//pg_wal/00000002000022C300000092" was timestamp at 0621 - two minutes before I promoted the standby.

Going through the mailing list archives I thought I read somewhere that maybe I needed to checkpoint the standby in order to prevent this very issue - but that doesn't seem to help.

My question is, how can I achieve the behavior I'd like where pg_rewind "just works" (I don't have to copy wal files from the primary to the standby).

I did find an email in the archives from 2018 with the subject "pg_rewind: confused about recovery.conf/restore_command" which the author thought that the restore_command may be the solution - but didn't actually work out.

Thanks,

Matt

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux