TPC-H has two runs
PowerRun which is single stream (Q1-22 RF1, RF2)
And Throughput Runs which has "N" (depends on scale) running
simultaneously in a mixed sequence of the same queries and the two
update functions. During throughput run you can expect to max out CPU...
But commerial databases generally have PowerRuns running quite well even
on multi-cores ( Oracle (without RAC have published with 144 cores on
Solaris)
As for IO system saturating the CPU its two folds
Kernel fetching in the data which saturates at some value
and in this case PostgreSQL reading the data and putting it in its
bufferpool
An example of how I use it is as follows:
Do a select query on a table such that it results in table scan without
actually returning any rows back
Now keep throwing hardware (better storage) till it saturates the CPU.
That's the practical max you can do with the CPU/OS combination
(considering unlimited storage bandwidth). This one is primarily used in
guessing how fast one of the queries in TPC-H will complete.
In my tests with PostgreSQL, I generally reach the CPU limit without
even reaching the storage bandwidth of the underlying storage.
Just to give numbers
Single 2Gb Fiber Channel port can practically go upto 180 MB/sec
Single 4Gb ports have proven to go upto 360-370MB/sec
So to saturate a FC port, postgreSQL has to be able to scan 370MB/sec
without saturating the CPU.
Then comes software stripping which allows multiple ports to be stripped
over increasing the capacity of the bandwidth... Now scanning has to be
able to drive Nx370MB/sec (all on single core).
I had some numbers and I had some limitations based on cpu frequency,
blocksize ,etc but those were for 8.1 days or so..
I think to take PostgreSQL a bit high end, we have to first scale out
these numbers.
Doing some sorts of test in PostgreSQL farms for every release actually
does help people see the amount of data that it can drive through...
We can actually work on some database operation metrics to also guage
how much each release is improving over older releases.. I have ideas
for few of them.
Regards,
Jignesh
Gregory Stark wrote:
"Jignesh K. Shah" <J.K.Shah@xxxxxxx> writes:
Then for the power run that is essentially running one query at a time should
essentially be able to utilize the full system (specially multi-core systems),
unfortunately PostgreSQL can use only one core. (Plus since this is read only
and there is no separate disk reader all other processes are idle) and system
is running at 1/Nth capacity (where N is the number of cores/threads)
Is the whole benchmark like this or is this just one part of it?
Is the i/o system really able to saturate the cpu though?
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings