Am 16.05.22 um 09:56 schrieb Thorsten Schöning:
Hi everyone, for various historical reasons I maintain a database containing large file uploads, which makes uncompressed output of pg_dump ~200 GiB in size currently. I'm storing that dump to some NAS and am trying to forward it from there using RSYNC to multiple different additional offsite USB disks. I'm doing the same with the files directory of Postgres already after taking BTRFS snapshots etc. and for those files things work pretty well with RSYNC. Lots of files are skipped entirely, some are slightly updated in-place, some updates are a bit larger depending on the actual changes and when RSYNC executed last etc. Though, with the large dumps it seems to me that with every slight change in the actual data the entire dump gets downloaded again. I'm already using uncompressed dumps in the hope that the output is more stable and RSYNC better able to recognize unchanged parts. But I guess that most changes in the dumped data simply result in all subsequent data being that misplaced compared to what RSYNC reads against, that it's like downloading the whole file again in the end. Is that simply the way it is or are there some optimizations possible when using pg_dump? Am using Postgres 11 and don't see anything which seems to help in this use-case. Thanks! Mit freundlichen Grüßen Thorsten Schöning
Hi Thorsten, This is an rsync question, not a pg_dump question.If you want to sync a new version of a file without transferring the whole thing, you have to use the option -c or --checksum.
This works well only if some blocks of the file have changed, while most others haven't. This won't be the case of a pg_dump.
So I don't see a way of re-syncing the way you expect it to. Regards, Holger -- Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature