pg_dump performance

Jared Mauch <jared@xxxxxxxxxxxxxxx> · Wed, 26 Dec 2007 15:22:30 -0500



	I've been looking at the performance of pg_dump in
the past week off and on trying to see if I can get it to
work a bit faster and was looking for tips on this.

	doing a pg_dump on my 34311239 row table (1h of data btw)
results in a wallclock time of 187.9 seconds or ~182k rows/sec.

	I've got my insert (COPY) performance around 100k/sec and
was hoping to get the reads to be much faster.  The analysis I'm
doing is much faster doing a pg_dump than utilizing a few
queries for numerous reasons.  (If you care, I can enumerate them
to you privately but the result is pg_dump is the best way to handle
the multiple bits of analysis that are needed, please trust me).

	What i'm seeing:

	pg_dump is utilizing about 13% of the cpu and the
corresponding postgres backend is at 100% cpu time.
(multi-core, multi-cpu, lotsa ram, super-fast disk).

	I'm not seeing myself being I/O bound so was interested
if there was a way I could tweak the backend performance or
offload some of the activity to another process.

	pg8.3(beta) with the following variances from default

checkpoint_segments = 300        # in logfile segments, min 1, 16MB each
effective_cache_size = 512MB    # typically 8KB each
wal_buffers = 128MB                # min 4, 8KB each
shared_buffers = 128MB            # min 16, at least max_connections*2, 8KB each
work_mem = 512MB                 # min 64, size in KB


	unrelated but associated data, the table has one index on it.
not relevant for pg_dump but i'm interested in getting better concurent index
creation (utilize those cpus better but not slow down my row/sec perf)
but that's another topic entirely..

	Any tips on getting pg_dump (actually the backend) to perform 
much closer to 500k/sec or more?  This would also aide me when I upgrade 
pg versions and need to dump/restore with minimal downtime (as the data 
never stops coming.. whee).

	Thanks!

	- Jared

-- 
Jared Mauch  | pgp key available via finger from jared@xxxxxxxxxxxxxxx
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match