Re: Question about pg_wal.tar.gz generated by pg_basebackup

Dhirendra Singh <dhirendraks@xxxxxxxxx> · Wed, 10 May 2023 21:02:54 +0530

Hi Stephen, Thanks for the reply.
I forgot to mention that i am also archiving the generated wal files. My question was in this context.
According to your answers, it looks like it is not needed.

Thanks,
Dhirendra.

On Wed, May 10, 2023 at 5:36 PM Stephen Frost <sfrost@xxxxxxxxxxx> wrote:
Greetings,

* Dhirendra Singh (dhirendraks@xxxxxxxxx) wrote:

> I am taking backup using pg_basebackup for the purpose of point in time

> recovery in case something goes wrong with the data.

pg_basebackup works decently for this but it's pretty basic.

> The backup generates pg_wal.tar.gz which contains the wal file generated

> during the backup.

Yes, those are the WAL files needed to restore that backup.

> Is it of any use in the point in time recovery? do i have to archive it?

Yes and no ... if you're independently archiving your WAL then you don't

really need the WAL from pg_basebackup, but if you're not also archiving

your WAL then you absolutely must keep the WAL from the pg_basebackup.

There's an option in pg_basebackup which allows you to choose if you get

the WAL or not with pg_basebackup.

> According to the documentation, any wal files in pg_wal directory has to be

> deleted after the base backup is restored and before the recovery starts.

Strictly speaking the thing to do would be to put the WAL from

pg_basebackup somewhere and then have a restore command written that is

able to pull that WAL when PG asks for it.  While the documentation says

that, we will still look in the WAL dir as the location of last restore

when doing replay so putting the WAL from pg_wal.tar.gz in there will

work too.

> So if i understand it correctly, pg_wal.tar.gz is not required for point in

> time recovery...but just want to confirm if my understanding is correct.

Archiving with archive_command (or using pg_receivewal) is required for

PITR.

> My sole purpose of taking the backup is for point in time recovery. I am

> not going to use it to create a standby server.

Have you configured an archive_command...?  If not, you'll need to, to

get any proper PITR.  What pg_basebackup is giving you is really a

single, but complete, backup but that will only restore to the end of

that backup, to go any farther you'll need the WAL that was written

after that backup was taken and to get that you need to set up an

archive command or use pg_receivewal.

Note that with pg_basebackup and pg_receivewal or some hand written

archive_command, you still need to write your own backup and WAL

retention code, when doing a restore figure out which backup to use for

the restore to get to the point in time you want, deal with

pg_basebackup and pg_receivewal being single-threaded, and a bunch of

other things that bespoke tooling like pgBackRest was written to

explicitly deal with.

You might save a lot of time by checking out pgBackRest instead,

especially for PITR.

Thanks,

Stephen