Hi Simon,
I have some insight into TPC-H on how it works.
First of all I think it is a violation of TPC rules to publish numbers
without auditing them first. So even if I do the test to show the
better performance of PostgreSQL 8.3, I cannot post it here or any
public forum without doing going through the "process". (Even though it
is partial benchmark as they are just doing the equivalent of the
PowerRun of TPCH) Maybe the PR of PostgreSQL team should email
info@xxxxxxx about them and see what they have to say about that comparison.
On the technical side:
Remember all TPC-H queries when run sequentially on PostgreSQL uses only
1 core or virtual CPU so it is a very bad for system to use it with
PostgreSQL (same for MySQL too).
Also very important unless you are running the UPDATE FUNCTIONS which
are separate queries, all these Q1-Q22 Queries are pure "READ-ONLY"
queries. Traditionally I think PostgreSQL does lack "READ-SPEED"s
specially since it is bottlenecked by the size of the reads it does
(BLOCKSIZE). Major database provides multi-block parameters to do
multiple of reads/writes in terms of blocksizes to reduce IOPS and also
for read only they also have READ-AHEAD or prefetch sizes which is
generally bigger than multi-block or extent sizes to aid reads.
Scale factor is in terms of gigs and hence using max scale of 5 (5G) is
pretty useless since most of the rows could be cached in modern day
systems. And comparing with 0.01 is what 10MB? Size of recent L2 cache
of Intel is probably bigger than that size.
If you are doing tuning for TPC-H Queries focus on few of them:
For example Query 1 is very Join intensive and if your CPU is not 100%
used then you have a problem in your IO to solve before tuning it.
Another example is Query 16 is literally IO scan speed, many people use
it to see if the database can scan at "line speeds" of the storage,
ending up with 100% CPU means the database cannot process that many rows
(just to bring it in).
In essence each query does some combination of system features to
highlight the performance. However since it is an old benchmark,
database companies end up "re-engineering" their technologies to gain
advantage in this benchmark (Hence its time for a successor in work
called TPC-DS which will have more than 100 such queries)
Few of the technologies that have really helped gain ground in TPC-H world
* Hash and/or Range Partitioning of tables ( PostgreSQL 8.3 can do that
but the setup cost of writing schema is great specially since data has
to be loaded in separate tables)
* Automated Aggregated Views - used by optmiziers - database technology
to update more frequently used aggregations in a smaller views
* Cube views Index - like bitmap but multidimensional (I think ..but not
sure)
That said, is it useful to be used in "Regression testing in PostgreSQL
farms. I would think yes.. specially Q16
Hope this helps.
Regards,
Jignesh
Simon Riggs wrote:
Can I ask for some help with benchmarking?
There are some results here that show PostgreSQL is slower in some cases
than Monet and MySQL. Of course these results were published immediately
prior to 8.2 being released, plus run out-of-the-box, so without even
basic performance tuning.
Would anybody like to repeat these tests with the latest production
versions of these databases (i.e. with PGSQL 8.3), and with some
sensible tuning settings for the hardware used? It will be useful to get
some blind tests with more sensible settings.
http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/
Multiple runs from different people/different hardware is useful since
they help to iron-out differences in hardware and test methodology. So
don't worry if you see somebody else doing this also.
Thanks,
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq