Hello all,
I'm attempting to create a long term, stand alone backup archive script for a pretty active and large database (3TB) which will store backups going back a few months.
Using the combination of pg_basebackup in TAR format, with the `Xn` flag, and running pg_receivewal in the background, I'm ending up with a backup that is missing quite a few WAL files.
Trying to figure out where my understanding of the process is incorrect, and what I need to do to correct this.
Backup process:
* Start pg_receivewal to archive location* Then run: `pg_basebackup -h {host} -p {port} -D - -Ft -Xn --checkpoint=fast | ...`
* This backup is taken from a standby node
* Once pg_basebackup finishes, send SIGINT to pg_receivewal
Recovery Process:
* Untar basebackup to data directory* Copy archived WAL files to a separate directory on the same drive* settings: restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'recovery_target = immediate
Once I start the DB, I immediately get
cp: cannot stat '/var/lib/pgsql/wal_archive/0000001200003CF80000003F': No such file or directory
WAL FILES:
backup_label file:0000001200003CF80000003F << First WAL file the DB looks for0000001200003CF700000075 << The next WAL file---------------------------------------- << Missing WAL files from pg_receivewal archive...0000001200003CF8000000C8 << First WAL file in pg_receivewal archive
START WAL LOCATION: 3CF7/75974400 (file 0000001200003CF700000075)CHECKPOINT LOCATION: 3CF8/3F1139C0BACKUP METHOD: streamedBACKUP FROM: standbySTART TIME: 2022-06-08 12:51:52 EDTLABEL: pg_basebackup base backupSTART TIMELINE: 18
I can see that START WAL LOCATION & CHECKPOINT LOCATION are where the basebackup starts to recover from, but how come these are not in my pg_receivewal archive, if I'm starting it before starting the pg_basebackup process? How can I ensure they are part of this archive?
Since this is a recent backup, I have a separate WAL archive from where I can just recover those using a different restore_command, and once that missing gap of WAL files is recovered from this archive, stopping the server
and switching back to `restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'` and recovering the rest of the WAL files from the pg_receivewal archive works fine. The DB recovers and accepts connections shortly after.
The whole point of this is to have a several months old, stand alone backup, I need to have all the WAL files available in that original WAL archive created with pg_receivewal.
Appreciate the help, thank you in advance!