2010/5/27 David Jarvis <thangalin@xxxxxxxxx>: > Hi, Bryan. > > I was just about to reply to the thread, thanks for asking. Clustering was > key. After rebooting the machine (just to make sure absolutely nothing was > cached), I immediately ran a report on Toronto: 5.25 seconds! > > Here's what I did: > > Created a new set of tables that matched the old set, with statistics of > 1000 on the station and taken (date) columns. > Inserted the data from the old hierarchy into the new set, ordered by > station id then by date (same seven child tables as before: one per > category). > > I wanted to ensure a strong correlation between primary key and station id. > > Added three indexes per table: (a) station id; (b) date taken; and (c) > station-taken-category. > Set the station-taken-category index as CLUSTER. > Vacuumed the new tables. > Dropped the old tables. > Set the following configuration values: > > shared_buffers = 1GB > temp_buffers = 32MB > work_mem = 32MB > maintenance_work_mem = 64MB > seq_page_cost = 1.0 > random_page_cost = 2.0 > cpu_index_tuple_cost = 0.001 > effective_cache_size = 512MB > > I ran a few more reports (no reboots, but reading vastly different data > sets): > > Vancouver: 4.2s > Yellowknife: 1.7s > Montreal: 6.5s > Trois-Riviers: 2.8s > > No full table scans. I imagine some indexes are not strictly necessary and > will test to see which can be removed (my guess: the station and taken > indexes). The problem was that the station ids were scattered and so > PostgreSQL presumed a full table scan would be faster. > > Physically ordering the data by station ids triggers index use every time. > > Next week's hardware upgrade should halve those times -- unless anyone has > further suggestions to squeeze more performance out of PG. ;-) I wonder what the plan will be if you replace sc.taken_* in : m.taken BETWEEN sc.taken_start AND sc.taken_end by values. It might help the planner... Also, I'll consider explicit ordered join but I admit I haven't read the whole thread (in particular the table size). Ho, and I set statistics to a highter value for column category_id, table station_category (seeing the same resquest and explain analyze without date in the query will help) > > Dave > > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance