Re: Improving pg_dump performance

Stephen Frost <sfrost@xxxxxxxxxxx> · Mon, 23 Jul 2018 09:32:30 -0400

Greetings,

* Ron (ronljohnsonjr@xxxxxxxxx) wrote:
> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that
> needs to be migrated to a new data center and then restored to v9.6.9.

You should be using 9.6's pg_dump to perform the export.  Might be a bit
annoying to do, but you should be able to install it on to a nearby
server or the same server as 8.4 is running on but in another location.

With 9.6's pg_dump, you could use parallel mode, but you have to prevent
anything from changing the data between when the first connection from
pg_dump is made until all of the connections have completed and started
their transactions (should be just a few seconds, really).  Of course,
that export won't include any changes after the pg_dump starts, so
you'll need a way to manage those.

> The database has many large tables full of bytea columns containing pdf
> images, and so the dump file is going to be more than 2x larger than the
> existing data/base...

The dump file isn't going to include any content from indexes and, at
least looking at some PDFs locally, they can certainly be compressed
effectively sometimes, and they might be getting compressed today in
your 8.4 instance thanks to TOAST, and more to the point, the textual
representation of a bytea which will end up in the export would almost
certainly be compressable too.

> The command is:
> $ pg_dump -v -Z0 -Fc $DB --file=${TARGET}/${DATE}_${DB}.dump 2>
> ${DATE}_${DB}.log
> 
> Using -Z0 because pdf files are already compressed.

They aren't really PDF files in the export though- they're bytea's
represented in a textual format.  Have you tested the difference between
using -Z0 and allowing compression to happen?

> Because of an intricate web of FK constraints and partitioned tables, the
> customer doesn't trust a set of "partitioned" backups using --table= and
> regular expressions (the names of those big tables all have the year in
> them), and so am stuck with a single-threaded backup.

All of the FKs will be re-checked when the data is imported into the new
instance.

> Are there any config file elements that I can tweak (extra points for not
> having to restart postgres) to make it run faster, or deeper knowledge of
> how pg_restore works so that I could convince them to let me do the
> partitioned backups?

pg_restore isn't doing much more than restoring what's in the backup
into the database using COPY commands.  Since it's an export/import, all
the FKs and other constraints will be re-checked and all indexes will be
rebuilt during the import.

> Lastly, is there any way to not make the backups so large (maybe by using
> the --binary-upgrade option, even though the man page says, "in-place
> upgrades only")?

You could possibly upgrade the existing system from 8.4 to 9.6 in-place
(which would require a bit of downtime but typically on the order of
minutes instead of many hours) and then take a filesystem-level backup
using a tool like pgBackRest and then restore that at the new data as a
replica and use streaming replication until you're ready to cut over to
the new data center.  That's probably how I'd tackle this anyway, though
one nice thing about the dump/restore is that you could have checksums
enabled on the new cluster.

Thanks!

Stephen
Attachment:
signature.asc

Description: PGP signature