Re: Benchmark Data requested

"Jignesh K. Shah" <J.K.Shah@xxxxxxx> · Mon, 04 Feb 2008 19:49:22 -0500

TPC-H has two runs
PowerRun which is single stream (Q1-22 RF1, RF2)
And Throughput Runs which has "N" (depends on scale) running 
simultaneously in a mixed sequence of the same queries and the two 
update functions. During throughput run you can expect to max out CPU... 
But commerial databases generally have PowerRuns running quite well even 
on multi-cores ( Oracle (without RAC have published with 144 cores on 
Solaris)

As for IO system saturating the CPU its two folds
Kernel fetching in the data which saturates at some value
and in this case PostgreSQL reading the data and putting it in its 
bufferpool

An example of how I use it is as follows:
Do a select query on a table such that it results in table scan without 
actually returning any rows back
Now keep throwing hardware (better storage) till it saturates the CPU. 
That's the practical max you can do with the CPU/OS combination 
(considering unlimited storage bandwidth). This one is primarily used in 
guessing how fast one of the queries in TPC-H will complete.

In my tests with PostgreSQL, I generally reach the CPU limit without 
even reaching the storage bandwidth of the underlying storage.
Just to give numbers
Single 2Gb Fiber Channel port can practically go upto 180 MB/sec
Single 4Gb ports have proven to go upto 360-370MB/sec
So to saturate a FC port, postgreSQL has to be able to scan 370MB/sec 
without saturating the CPU.
Then comes software stripping which allows multiple ports to be stripped 
over increasing the capacity of the bandwidth... Now scanning has to be 
able to drive Nx370MB/sec (all on single core).

I had some numbers and I had some limitations based on cpu frequency, 
blocksize ,etc but those were for 8.1 days or so..

I think to take PostgreSQL a bit high end, we have to first scale out 
these numbers.
Doing some sorts of test in PostgreSQL farms for every release actually 
does help people see the amount of data that it can drive through...

We can actually work on some database operation metrics to also guage 
how much each release is improving over older releases.. I have ideas 
for few of them.

Regards,
Jignesh

Gregory Stark wrote:
"Jignesh K. Shah" <J.K.Shah@xxxxxxx> writes:

Then for the power run that is essentially running one query at a time should
essentially be able to utilize the full system (specially multi-core systems),
unfortunately PostgreSQL can use only one core. (Plus since this is read only
and there is no separate disk reader all other processes are idle) and system
is running at 1/Nth capacity (where N is the number of cores/threads)

Is the whole benchmark like this or is this just one part of it?

Is the i/o system really able to saturate the cpu though?

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings