Re: TPC-H Scaling Factors X PostgreSQL Cluster Command

"Nelson Kotowski" <nkotowski@xxxxxxxxx> · Mon, 23 Apr 2007 12:52:36 -0300

Hi Heikki,

Thanks for answering! :)

 I don't get how creating only the indexes i cluster on would improve my cluster command perfomance. I believed that all other indexes wouldn't interfere because so far they're created in a fashionable time and they don't refer to any field/column in the orders/lineitem table. Could you explain me again?

As for the load, when you say the right order to start, you mean i should order the load file by the index field in the table before loading it?

Thanks in advance,
Nelson P Kotowski Filho.

On 4/23/07, Heikki Linnakangas <heikki@xxxxxxxxxxxxxxxx> wrote:
Nelson Kotowski wrote:
> So far, i need to do it in three different scale factors (1, 2 and 5GB
> databases).
>
> My build process comprehends creating the tables without any foreign keys,
> indexes, etc. - Running OK!

> Then, i load the data from the flat files generated through DBGEN software
> into these tables. - Running OK!
>
> Finally, i run a "optimize" script that does the following:
>

> - Alter the tables to add the mandatory foreign keys;
> - Create all mandatory indexes;
> - Cluster the orders table by the orders table index;
> - Cluster the lineitem table by the lineitem table index;

> - Vacuum the database;
> - Analyze statistics.

Cluster will completely rewrite the table and indexes. On step 2, you
should only create the indexes you're clustering on, and create the rest

of them after clustering.

Or even better, generate and load the data in the right order to start
with, so you don't need to cluster at all.

--
   Heikki Linnakangas
   EnterpriseDB   
http://www.enterprisedb.com