Re: pg_dump slower than pg_restore

Bosco Rama <postgres@xxxxxxxxxxxxx> · Thu, 03 Jul 2014 18:26:16 -0700

On 07/03/14 17:30, David Wall wrote:
> 
> Bosco, maybe you can recommend a different approach.  I pretty much run 
> daily backups that I only have for disaster recovery.  I generally don't 
> do partials recoveries, so I doubt I'd ever modify the dump output.  I 
> just re-read the docs about formats, and it's not clear what I'd be best 
> off with, and "plain" is the default, but it doesn't say it can be used 
> with pg_restore.

Correct.  Plain is essentially one big SQL command file that you feed to
psql as the restore process.

> Maybe the --format=c isn't the fastest option for me,

I think you are on the right track with -Fc since plain would end up
converting the LO parts to escaped byte format for storage in the SQL
command file.

> and I'm less sure  about the compression.

You seem to be gaining a considerable amount of compression and that is
probably worthwhile given that it is mostly going to be the 'easy' kind.

> I do want to be able to restore using pg_restore (unless plain is the
> best route, in which case, how do I restore that type of backup?), and
> I need to include large objects (--oids), but otherwise, I'm mostly
> interested in it being as quick as possible.

Hmmm.  You are using '--oids' to *include* large objects?  IIRC, that's
not the intent of that option.  Large objects are dumped as part of a
DB-wide dump unless you request that they not be.

However, if you restrict your dumps to specific schemata and/or tables
then the large objects are NOT dumped unless you request that they are.

Something to keep in mind.

> Many of the large objects are gzip compressed when stored.  Would I be 
> better off letting PG do its compression and remove gzip, or turn off 
> all PG compression and use gzip?  Or perhaps use neither if my large 
> objects, which take up the bulk of the database, are already compressed?

OK.  Given all the above (and that gpg will ALSO do compression unless
told not to), I'd go with the following (note lowercase 'z' in gpg
command).  Note also that there may be a CPU vs I/O trade-off here that
may make things muddier but the following are 'conceptually' true.

Fast but big
============
   $ pg_dump -Z0 -Fc ... $DB | gpg -z0 ... | split

Less fast but smaller
=====================
   $ pg_dump -Z1 -Fc ... $DB | gpg -z0 ... | split

(the '...' would be any access and/or key selection options)

Do some timing/size comparisons and see which is best for you.

BTW, is there any particular reason to do the 'split'?

For later consideration
=======================
Once you have this under control you may want to investigate the -Fd
format.  It allows you to use the '-j' option that may speed things
up even further.  However, I have no experience with it so I am unable
to make any concrete recommendations about it.  It *would* require a
post dump tar/zip/cpio to get to a single file backup, though.  If you
don't need a single file solution (as possibly exemplified by your use
of 'split') it may be just what you want.

Let us know how it goes. :-)

HTH,
Bosco.