Re: Benchmark Data requested

"Jignesh K. Shah" <J.K.Shah@xxxxxxx> · Mon, 04 Feb 2008 17:55:33 -0500

Doing it at low scales is not attractive.

Commercial databases are publishing at scale factor of 1000(about 1TB) 
to 10000(10TB) with one in 30TB space. So ideally right now tuning 
should start at 1000 scale factor.

Unfortunately I have tried that before with PostgreSQL the few of the 
problems are as follows:

Single stream loader of PostgreSQL takes hours to load data. (Single 
stream load... wasting all the extra cores out there)

Multiple table loads ( 1 per table) spawned via script  is bit better 
but hits wal problems.

To avoid wal problems, I  had created tables and load statements within 
the same transaction, faster but cannot create index before load or it 
starts writing to wal... AND if indexes are created after load, it takes 
about a day or so to create all the indices required. (Its single 
threaded and creating multiple indexes/indices at the same time could 
result in running out of temporary "DISK" space since the tables are so 
big. Which means 1 thread at a time is the answer for creating tables 
that are really big. It is slow.

Boy, by this time most people doing TPC-H in high end give up on 
PostgreSQL.

I have not even started Partitioning of tables yet since with the 
current framework, you have to load the tables separately into each 
tables which means for the TPC-H data you need "extra-logic" to take 
that table data and split it into each partition child table. Not stuff 
that many people want to do by hand.

Then for the power run that is essentially running one query at a time 
should essentially be able to utilize the full system (specially 
multi-core systems), unfortunately PostgreSQL can use only one core. 
(Plus since this is read only and there is no separate disk reader all 
other processes are idle) and system is running at 1/Nth capacity (where 
N is the number of cores/threads)

(I am not sure here with Partitioned tables, do you get N processes 
running in the system when you scan the partitioned table?)

Even off-loading work like "fetching the data into bufferpool" into 
separate processes will go big time with this type of workloads.

I would be happy to help out if folks here want to do work related to 
it. Infact if you have time, I can request a project in one of the Sun 
Benchmarking center to see what we can learn with community members 
interested in understanding where PostgreSQL performs and fails.

Regards,
Jignesh

Greg Smith wrote:
On Mon, 4 Feb 2008, Simon Riggs wrote:

Would anybody like to repeat these tests with the latest production
versions of these databases (i.e. with PGSQL 8.3)

Do you have any suggestions on how people should run TPC-H?  It looked 
like a bit of work to sort through how to even start this exercise.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match