Re: Random Page Cost and Planner

Cédric Villemain <cedric.villemain.debian@xxxxxxxxx> · Thu, 27 May 2010 10:03:09 +0200

2010/5/27 David Jarvis <thangalin@xxxxxxxxx>:
> Hi, Bryan.
>
> I was just about to reply to the thread, thanks for asking. Clustering was
> key. After rebooting the machine (just to make sure absolutely nothing was
> cached), I immediately ran a report on Toronto: 5.25 seconds!
>
> Here's what I did:
>
> Created a new set of tables that matched the old set, with statistics of
> 1000 on the station and taken (date) columns.
> Inserted the data from the old hierarchy into the new set, ordered by
> station id then by date (same seven child tables as before: one per
> category).
>
> I wanted to ensure a strong correlation between primary key and station id.
>
> Added three indexes per table: (a) station id; (b) date taken; and (c)
> station-taken-category.
> Set the station-taken-category index as CLUSTER.
> Vacuumed the new tables.
> Dropped the old tables.
> Set the following configuration values:
>
> shared_buffers = 1GB
> temp_buffers = 32MB
> work_mem = 32MB
> maintenance_work_mem = 64MB
> seq_page_cost = 1.0
> random_page_cost = 2.0
> cpu_index_tuple_cost = 0.001
> effective_cache_size = 512MB
>
> I ran a few more reports (no reboots, but reading vastly different data
> sets):
>
> Vancouver: 4.2s
> Yellowknife: 1.7s
> Montreal: 6.5s
> Trois-Riviers: 2.8s
>
> No full table scans. I imagine some indexes are not strictly necessary and
> will test to see which can be removed (my guess: the station and taken
> indexes). The problem was that the station ids were scattered and so
> PostgreSQL presumed a full table scan would be faster.
>
> Physically ordering the data by station ids triggers index use every time.
>
> Next week's hardware upgrade should halve those times -- unless anyone has
> further suggestions to squeeze more performance out of PG. ;-)

I wonder what the plan will be if you replace sc.taken_* in :
m.taken BETWEEN sc.taken_start AND sc.taken_end
by values. It might help the planner...

Also, I'll consider explicit ordered join but I admit I haven't read
the whole thread (in particular the table size).
Ho, and I set statistics to a highter value for column category_id,
table station_category  (seeing the same resquest and explain analyze
without date in the query will help)

>
> Dave
>
>

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance