On Mon, Mar 7, 2011 at 7:28 AM, chris r. <chricki@xxxxxxx> wrote: > Merlin, first of all, thanks for your reply! > >> hm. where exactly is all this time getting spent? Are you i/o bound? >> cpu bound? Is there any compression going on? > Very good questions. pg_dump -F c compresses per default "at a moderate > level" (manpage), whatever compression level 'moderate' actually means. > Thus, yes, without explicitly activating it, we use compression. > > For testing, I inserted a fraction of our huge table with bytea content > to the table 'testtable'. The next three outputs compare pg_dump for > this table with default compression level, no compression and low-level > compression on level 3. The time spent seems CPU-bound, as in the first > test case 90-100% of a CPU-core is used all over the time. > > > (default compression) > time pg_dump -f /tmp/test.sql -F c -t testtable mydb > real 0m27.255s > user 0m26.383s > sys 0m0.180s > > (low-level compression) > time pg_dump -f /tmp/test.sql -F c -Z 3 -t testtable mydb > real 0m8.883s > user 0m8.112s > sys 0m0.161s > > (no compression) > time pg_dump -f /tmp/test.sql -F c -Z 0 -t testtable mydb > real 0m1.892s > user 0m0.074s > sys 0m0.279s > > To summarize, in our case-scenario, moderate-level compression caused a > speed-loss of factor 14. right -- well in the short term it looks like you should consider lowering or disabling compression. > In another test, I'll compare pg_dump of a table with textual content > that I created stupidly with: > > select (t/23.0)::text||(t/17.0)::text > into testtable > from generate_series(1, 1000000) t; > > Very much to my surprise, dumping this table did not show such a huge > difference when using compression: a default-compressed pg_dump took > 2.4s, whereas a non-compressed pg_dump took 2.0s (which is merely factor > 1.2x). However, when expanding the series to 3 mio (instead of 1 mio), > the compressed pg_dump took 7.0s, whereas a non-compressed pg_dump ran > for 2.4s only (factor 3x). Does this show that compression takes > relatively longer the more data it needs to compress? Memory consumption > was less than 12 MB during testing. Most compression algs don't use a lot of memory. Also, as a general rule of thumb low entropy data compresses must faster than high entropy data so you can't really compare synthetic tests like that to real world data as you discovered. Unfortunately, compression is something of a weak point for the postgres project: there are much better bang/buck ratio algorithms out there that we can't use because of licensing or patent concerns. There are a lot of easy workarounds though (like rigging command line compressor post dump) so it isn't really a big deal for backups. You may want to investigate if your bytea columns are being toast compressed and look there if you are having performance issues. >> Maybe this is a >> performance issue inside pg_dump itself, not necessarily a text/binary >> issue (i have a hard time believing going from b64->hex is 10x slower >> on format basis alone). Can you post times comparing manual COPY via >> text, manual COPY via binary, and pg_dump -F c? > Again, valid points. As a next step, I'll compare the COPY variants. > > time psql mydb -c "COPY testtable TO '/tmp/test.sql' WITH (FORMAT 'text');" > real 0m1.712s > user 0m0.001s > sys 0m0.004s > > In text format, the time it takes to COPY testtable seems very much like > the time it takes to run pg_dump without compression. > > Interestingly, COPYing testtable with binary format gives another factor > 3.6x speedup: > > time psql mydb -c "COPY testtable TO '/tmp/test.sql' WITH (FORMAT > 'binary');" > real 0m0.470s > user 0m0.000s > sys 0m0.005s > > As one may argue the table was too small to compare the runtime, I > repeated this second comparison with two larger tables - both times > showing between 5x-6x speedup with binary format! In either format the > operation seemed CPU bound (> 95% of a core was taken). > > > > To summarize, I could speed up my backup by removing compression (factor > 14) and using COPY in binary format instead of pg_dump (factor 5 to > factor 6). However, only the first option would keep data integrity. To > have an easy integrity-save backup, IMHO, the second option can only be > achieved by having an additional switch in pg_dump allowing for binary > output. Well, that's a pretty telling case, although I'd venture to say not typical. In average databases, I'd expect 10-50% range of improvement going from text->binary which is often not enough to justify the compatibility issues. Does it justify a 'binary' switch to pg_dump? I'd say so -- as long as the changes required aren't to extensive (although you can expect disagreement on that point). hm. i'll take a look... merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general