On 7/30/09 1:58 PM, "Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> wrote: > Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote: > >> Now, what needs to be known with the pg_dump is not just how fast >> compression can go (assuming its gzip) but also what the duty cycle >> time of the compression is. If it is single threaded, there is all >> the network and disk time to cut out of this, as well as all the CPU >> time that pg_dump does without compression. > > Well, I established a couple messages back on this thread that pg_dump > piped to psql to a database on the same machine writes the 70GB > database to disk in two hours, while pg_dump to a custom format file > at default compression on the same machine writes the 50GB file in six > hours. No network involved, less disk space written. I'll try it > tonight at -Z0. So, I'm not sure what the pg_dump custom format overhead is minus the compression -- there is probably some non-compression overhead from that format other than the compression. -Z1 might be interesting too, but obviously it takes some time. Interesting that your uncompressed case is only 40% larger. For me, the compressed dump is in the range of 20% the size of the uncompressed one. > > One thing I've been wondering about is what, exactly, is compressed in > custom format. Is it like a .tar.gz file, where the compression is a > layer over the top, or are individual entries compressed? It is instructive to open up a compressed custom format file in 'less' or another text viewer. Basically, it is the same as the uncompressed dump with all the DDL uncompressed, but the binary chunks compressed. It would seem (educated guess, looking at the raw file, and not the code) that the table data is compressed and the DDL points to an index in the file where the compressed blob for the copy lives. > If the > latter, what's the overhead on setting up each compression stream? Is > there some minimum size before that kicks in? (I know, I should go > check the code myself. Maybe in a bit. Of course, if someone already > knows, it would be quicker....) Gzip does have some quirky performance behavior depending on the chunk size of data you stream into it. > > -Kevin > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance