On 7/31/09 4:01 PM, "PFC" <lists@xxxxxxxxxx> wrote: > On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > >> Greg Stark <gsstark@xxxxxxx> writes: >>> On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl@xxxxxxxxxxxxx> wrote: >>>> I did some tracing and verified that pg_dump passes data to deflate() >>>> one table row at a time. I'm not sure about the performance >>>> implications of that, but it does seem like it might be something to >>>> look into. >> >>> I suspect if this was a problem the zlib people would have added >>> internal buffering ages ago. I find it hard to believe we're not the >>> first application to use it this way. >> >> I dug into this a bit more. zlib *does* have internal buffering --- it >> has to, because it needs a minimum lookahead of several hundred bytes >> to ensure that compression works properly. The per-call overhead of >> deflate() looks a bit higher than one could wish when submitting short >> chunks, but oprofile shows that "pg_dump -Fc" breaks down about like >> this: > > During dump (size of dump is 2.6 GB), > > No Compression : > - postgres at 70-100% CPU and pg_dump at something like 10-20% > - dual core is useful (a bit...) > - dump size 2.6G > - dump time 2m25.288s > > Compression Level 1 : > - postgres at 70-100% CPU and pg_dump at 20%-100% > - dual core is definitely useful > - dump size 544MB > - dump time 2m33.337s > > Since this box is mostly idle right now, eating CPU for compression is no > problem... > I get very different (contradictory) behavior. Server with fast RAID, 32GB RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs. CentOS 5.2 8.3.6 No disk wait time during any test. One test beforehand was used to prime the disk cache. 100% CPU in the below means one core fully used. 800% means the system is fully loaded. pg_dump > file (on a subset of the DB with lots of tables with small tuples) 6m 27s, 4.9GB; 12.9MB/sec 50% CPU in postgres, 50% CPU in pg_dump pg_dump -Fc > file.gz 9m6s, output is 768M (6.53x compression); 9.18MB/sec 30% CPU in postgres, 70% CPU in pg_dump pg_dump | gzip > file.2.gz 6m22s, 13MB/sec. 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip The default (5) compression level was used. So, when using pg_dump alone, I could not get significantly more than one core of CPU (all on the same box). No matter how I tried, pg_dump plus the postgres process dumping data always totaled about 102% -- it would flulctuate in top, give or take 15% at times, but the two always were very close (within 3%) of this total. Piping the whole thing to gzip gets some speedup. This indicates that perhaps the implementation or use of gzip is inappropriate on pg_dump's side or the library version is older or slower. Alternatively, the use of gzip inside pg_dump fails to pipeline CPU useage as well as piping it does, as the above shows 50% more CPU utilization when piping. I can do the same test with a single table that is 10GB later (which does dump much faster than 13MB/sec and has rows that average about 500 bytes in size). But overall I have found pg_dump's performace sorely lacking, and this is a data risk in the big picture. Postgres is very good about not losing data, but that only goes up to the limits of the hardware and OS, which is not good enough. Because of long disaster recovery times and poor replication/contingency features, it is a fairly unsafe place for data once it gets beyond a certain size and a BC plan requires minimal downtime. > Adding an option to use LZO instead of gzip could be useful... > > Compressing the uncompressed 2.6GB dump : > > - gzip -1 : > > - compressed size : 565 MB > - compression throughput : 28.5 MB/s > - decompression throughput : 74 MB/s > > - LZO -1 : > - compressed size : 696M > - compression throughput : 86 MB/s > - decompression throughput : 247 MB/s > > Conclusion : LZO could help for fast disks (RAID) or slow disks on a > CPU-starved server... > LZO would be a great option, it is very fast, especially decompression. With gzip, one rarely gains by going below gzip -3 or above gzip -6. > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance