Re: How to get a more RSYNC compatible output of pg_dump?

Holger Jakobs <holger@xxxxxxxxxx> · Mon, 16 May 2022 12:52:41 +0200

Am 16.05.22 um 09:56 schrieb Thorsten Schöning:
Hi everyone,

for various historical reasons I maintain a database containing large
file uploads, which makes uncompressed output of pg_dump ~200 GiB in
size currently. I'm storing that dump to some NAS and am trying to
forward it from there using RSYNC to multiple different additional
offsite USB disks.

I'm doing the same with the files directory of Postgres already after
taking BTRFS snapshots etc. and for those files things work pretty
well with RSYNC. Lots of files are skipped entirely, some are slightly
updated in-place, some updates are a bit larger depending on the
actual changes and when RSYNC executed last etc.

Though, with the large dumps it seems to me that with every slight
change in the actual data the entire dump gets downloaded again. I'm
already using uncompressed dumps in the hope that the output is more
stable and RSYNC better able to recognize unchanged parts. But I guess
that most changes in the dumped data simply result in all subsequent
data being that misplaced compared to what RSYNC reads against, that
it's like downloading the whole file again in the end.

Is that simply the way it is or are there some optimizations possible
when using pg_dump? Am using Postgres 11 and don't see anything which
seems to help in this use-case.

Thanks!

Mit freundlichen Grüßen

Thorsten Schöning

Hi Thorsten,

This is an rsync question, not a pg_dump question.

If you want to sync a new version of a file without transferring the 
whole thing, you have to use the option -c or --checksum.

This works well only if some blocks of the file have changed, while most 
others haven't. This won't be the case of a pg_dump.

So I don't see a way of re-syncing the way you expect it to.

Regards,

Holger

--
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

Attachment:
OpenPGP_signature

Description: OpenPGP digital signature