Laurenz Albe <laurenz.albe@xxxxxxxxxxx> writes: > Egor Duda wrote: >> I've recently tried to use borg backup (https://borgbackup.readthedocs.io/) to store multiple >> PostgreSQL database dumps, and encountered a problem. Due to nondeterministic nature of pg_dump it >> reorders data tables rows on each invocation, which breaks borg backup chunking and deduplication >> algorithm. >> >> This means that each next dump in backup almost never reuses data from previous dumps, and so it's >> not possible to store multiple database dumps as efficiently as possible. >> >> I wonder if there's any way to force pg_dump use some predictable ordering of data rows (for >> example, by primary key, where possible) to make dumps more uniform, similar to mysqldump >> --order-by-primary option? > There is no such option. > I think you would be better off with physical backups using "pg_basebackup" if you > want to deduplicate, at least if deduplication is on the block level. I think the OP is fooling himself. pg_dump is perfectly deterministic: dump the same DB twice, you'll get identical outputs. The only way that the observed row order would vary so radically from run to run is if there's a great deal of row update activity in between, causing rows to get relocated in the heap. If there is, and assuming that his application isn't so dumb as to be issuing lots of no-op updates, then the data is changing a lot. Therefore there aren't going to be all that many exact duplicate blocks, no matter whether you define "block" as a physical data block or a group of rows consecutive in the PK order. So this doesn't sound like a case where dedup'ing is going to be very helpful for compressing backups. Conceivably sorting the rows would help at the margin, but I doubt it'd help enough to justify the cost of the sort. regards, tom lane