Re: PostgreSQL 8.4 performance tuning questions

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Mon, 3 Aug 2009 12:28:32 -0700

On 8/3/09 11:56 AM, "Tom Lane" <tgl@xxxxxxxxxxxxx> wrote:

> Scott Carey <scott@xxxxxxxxxxxxxxxxx> writes:
>> I get very different (contradictory) behavior.  Server with fast RAID, 32GB
>> RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs.  CentOS 5.2
>> 8.3.6
>> No disk wait time during any test.  One test beforehand was used to prime
>> the disk cache.
>> 100% CPU in the below means one core fully used.  800% means the system is
>> fully loaded.
> 
>> pg_dump > file  (on a subset of the DB with lots of tables with small
>> tuples)
>> 6m 27s, 4.9GB;  12.9MB/sec
>> 50% CPU in postgres, 50% CPU in pg_dump
> 
>> pg_dump -Fc > file.gz
>> 9m6s, output is 768M  (6.53x compression); 9.18MB/sec
>> 30% CPU in postgres, 70% CPU in pg_dump
> 
>> pg_dump | gzip > file.2.gz
>> 6m22s, 13MB/sec.
>> 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip
> 
> I don't see anything very contradictory here.

The other poster got nearly 2 CPUs of work from just pg_dump + postgres.
That contradicts my results (but could be due to data differences or
postgres version differences).
In the other use case, compression was not slower, but just used more CPU
(also contradicting my results).

> What you're demonstrating
> is that it's nice to be able to throw a third CPU at the compression
> part of the problem.

No, 1.5 CPU. A full use of a second would even be great.

I'm also demonstrating that there is some artificial bottleneck somewhere
preventing postgres and pg_dump to operate concurrently.  Instead, one waits
while the other does work.

Your claim earlier in this thread was that there was already pipelined work
being done due to pg_dump + postgresql -- which seems to be true for the
other test case but not mine.

As a consequence, adding compression throttles the postgres process even
though the compression hasn't caused 100% CPU (or close) on any task
involved.

>  That's likely to remain true if we shift to a
> different compression algorithm.  I suspect if you substituted lzo for
> gzip in the third case, the picture wouldn't change very much.
> 

That is exactly the point.  LZO would be nice (and help mitigate this
problem), but it doesn't solve the real problem here.  Pg_dump is slow and
artificially throttles without even getting 100% CPU from itself or
postgres.

The problem still remains:  dumping with -Fc can be significantly slower
than raw piped to a compression utility, even if no task is CPU or I/O
bound. Dumping and piping to gzip is faster.  But parallel restore won't
work without custom or raw format.

>                         regards, tom lane
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance