Re: pg_dump slow with bytea data

"chris r." <chricki@xxxxxxx> · Mon, 07 Mar 2011 14:28:52 +0100

Merlin, first of all, thanks for your reply!

> hm.  where exactly is all this time getting spent?  Are you i/o bound?
> cpu bound? Is there any compression going on?
Very good questions. pg_dump -F c compresses per default "at a moderate
level" (manpage), whatever compression level 'moderate' actually means.
Thus, yes, without explicitly activating it, we use compression.

For testing, I inserted a fraction of our huge table with bytea content
to the table 'testtable'. The next three outputs compare pg_dump for
this table with default compression level, no compression and low-level
compression on level 3. The time spent seems CPU-bound, as in the first
test case 90-100% of a CPU-core is used all over the time.

(default compression)
time pg_dump -f /tmp/test.sql -F c -t testtable mydb
real    0m27.255s
user    0m26.383s
sys     0m0.180s

(low-level compression)
time pg_dump -f /tmp/test.sql -F c -Z 3 -t testtable mydb
real    0m8.883s
user    0m8.112s
sys     0m0.161s

(no compression)
time pg_dump -f /tmp/test.sql -F c -Z 0 -t testtable mydb
real    0m1.892s
user    0m0.074s
sys     0m0.279s

To summarize, in our case-scenario, moderate-level compression caused a
speed-loss of factor 14.

In another test, I'll compare pg_dump of a table with textual content
that I created stupidly with:

 select (t/23.0)::text||(t/17.0)::text
 into testtable
 from generate_series(1, 1000000) t;

Very much to my surprise, dumping this table did not show such a huge
difference when using compression: a default-compressed pg_dump took
2.4s, whereas a non-compressed pg_dump took 2.0s (which is merely factor
1.2x). However, when expanding the series to 3 mio (instead of 1 mio),
the compressed pg_dump took 7.0s, whereas a non-compressed pg_dump ran
for 2.4s only (factor 3x). Does this show that compression takes
relatively longer the more data it needs to compress? Memory consumption
was less than 12 MB during testing.

> Maybe this is a
> performance issue inside pg_dump itself, not necessarily a text/binary
> issue (i have a hard time believing going from b64->hex is 10x slower
> on format basis alone).  Can you post times comparing manual COPY via
> text, manual COPY via binary, and pg_dump -F c?
Again, valid points. As a next step, I'll compare the COPY variants.

time psql mydb -c "COPY testtable TO '/tmp/test.sql' WITH (FORMAT 'text');"
real    0m1.712s
user    0m0.001s
sys     0m0.004s

In text format, the time it takes to COPY testtable seems very much like
the time it takes to run pg_dump without compression.

Interestingly, COPYing testtable with binary format gives another factor
3.6x speedup:

time psql mydb -c "COPY testtable TO '/tmp/test.sql' WITH (FORMAT
'binary');"
real    0m0.470s
user    0m0.000s
sys     0m0.005s

As one may argue the table was too small to compare the runtime, I
repeated this second comparison with two larger tables - both times
showing between 5x-6x speedup with binary format! In either format the
operation seemed CPU bound (> 95% of a core was taken).

To summarize, I could speed up my backup by removing compression (factor
14) and using COPY in binary format instead of pg_dump (factor 5 to
factor 6). However, only the first option would keep data integrity. To
have an easy integrity-save backup, IMHO, the second option can only be
achieved by having an additional switch in pg_dump allowing for binary
output.

Any comments on these measurements? Thanks again for your input!

Regards,
Chris

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general