Hi List, I would like some insight from the experts here as to how I can alter which index PostgreSQL is choosing to run a query. First off, I'm running an active web forum (phpBB) with sometimes hundreds of concurrent users. The query in question is one which pulls the lists of topics in the forum. The table in question is here: -- forums=> \d phpbb_topics; Table "public.phpbb_topics" Column | Type | Modifiers ----------------------+-----------------------+------------------------------------------------------- topic_id | integer | not null default nextval('phpbb_topics_id_seq'::text) forum_id | integer | not null default 0 topic_title | character varying(60) | not null default ''::character varying topic_poster | integer | not null default 0 topic_time | integer | not null default 0 topic_views | integer | not null default 0 topic_replies | integer | not null default 0 topic_status | smallint | not null default (0)::smallint topic_vote | smallint | not null default (0)::smallint topic_type | smallint | not null default (0)::smallint topic_first_post_id | integer | not null default 0 topic_last_post_id | integer | not null default 0 topic_moved_id | integer | not null default 0 topic_last_post_time | integer | not null default 0 Indexes: "forum_id_phpbb_topics_index" btree (forum_id) "topic_id_phpbb_topics_index" btree (topic_id) "topic_last_post_id_phpbb_topics_index" btree (topic_last_post_id) "topic_last_post_time_phpbb_topics_index" btree (topic_last_post_time) "topic_moved_id_phpbb_topics_index" btree (topic_moved_id) -- To layout the contents of the table, here are some relevant queries showing the number of entries forums=# SELECT COUNT(*) FROM phpbb_topics; SELECT COUNT(*) FROM phpbb_topics WHERE forum_id = 71; SELECT COUNT(*) FROM phpbb_topics WHERE forum_id = 55; count -------- 190588 (1 row) count ------- 1013 (1 row) count ------- 35035 (1 row) -- Ok. Now, here's the problem. I run a query to pull the list of topics for the forum. There pagination, so the first page query looks like this: SELECT t.topic_id FROM phpbb_topics AS t WHERE t.forum_id = 71 AND t.topic_id NOT IN (205026, 29046, 144569, 59780, 187424, 138635, 184973, 170551, 22419, 181690, 197254, 205130) ORDER BY t.topic_last_post_time DESC LIMIT 23 OFFSET 0 QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=3487.78..3487.87 rows=34 width=8) (actual time=1112.921..1113.005 rows=34 loops=1) -> Sort (cost=3486.15..3489.10 rows=1181 width=8) (actual time=1112.087..1112.535 rows=687 loops=1) Sort Key: topic_last_post_time -> Index Scan using forum_id_phpbb_topics_index on phpbb_topics t (cost=0.00..3425.89 rows=1181 width=8) (actual time=54.650..1109.877 rows=1012 loops=1) Index Cond: (forum_id = 71) Filter: (topic_id <> 205026) Total runtime: 1113.268 ms (7 rows) -- This is the query on one of the lesser active forums (forum_id = 71) which as list earlier only has 1013 rows. This query slow because PostgreSQL is not using the index on the "forum_id" column, but instead scanning through the topics via the topic_last_post_time and filtering through the posts. This would be good for the forum_id = 55 where the most recent topics would be quickly found. Now here's the stranger part, going deeper into the results (ie selecting pages further down), the planner does this: -- SELECT t.topic_id FROM phpbb_topics AS t WHERE t.forum_id = 71 AND t.topic_id NOT IN (205026) ORDER BY t.topic_last_post_time DESC LIMIT 34 OFFSET 653 QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=3487.78..3487.87 rows=34 width=8) (actual time=6.140..6.202 rows=34 loops=1) -> Sort (cost=3486.15..3489.10 rows=1181 width=8) (actual time=5.306..5.753 rows=687 loops=1) Sort Key: topic_last_post_time -> Index Scan using forum_id_phpbb_topics_index on phpbb_topics t (cost=0.00..3425.89 rows=1181 width=8) (actual time=0.070..3.581 rows=1012 loops=1) Index Cond: (forum_id = 71) Filter: (topic_id <> 205026) Total runtime: 6.343 ms (7 rows) -- This is more like how it should be done IMO. Results are much faster when the forum id index is used. Now, the output of the first query on the forum_id = 55 looks like this -- SELECT t.topic_id FROM phpbb_topics AS t WHERE t.forum_id = 55 AND t.topic_id NOT IN (159934, 168973, 79609, 179029, 61593, 184896, 190572) ORDER BY t.topic_last_post_time DESC LIMIT 28 OFFSET 0 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..50.50 rows=28 width=8) (actual time=0.060..0.714 rows=28 loops=1) -> Index Scan Backward using topic_last_post_time_phpbb_topics_index on phpbb_topics t (cost=0.00..63232.38 rows=35063 width=8) (actual time=0.057..0.675 rows=28 loops=1) Filter: ((forum_id = 55) AND (topic_id <> 159934) AND (topic_id <> 168973) AND (topic_id <> 79609) AND (topic_id <> 179029) AND (topic_id <> 61593) AND (topic_id <> 184896) AND (topic_id <> 190572)) Total runtime: 0.794 ms -- This is acceptable usage when the forum_id is heavily populated. Next now, here again puzzles me, pulling entries in the middle of forum_id = 55 -- SELECT t.topic_id FROM phpbb_topics AS t WHERE t.forum_id = 55 AND t.topic_id NOT IN (159934, 168973) ORDER BY t.topic_last_post_time DESC LIMIT 33 OFFSET 17458 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Limit (cost=29302.43..29302.51 rows=33 width=8) (actual time=625.907..625.969 rows=33 loops=1) -> Sort (cost=29258.78..29346.44 rows=35064 width=8) (actual time=603.710..615.411 rows=17491 loops=1) Sort Key: topic_last_post_time -> Seq Scan on phpbb_topics t (cost=0.00..26611.85 rows=35064 width=8) (actual time=0.067..528.271 rows=35034 loops=1) Filter: ((forum_id = 55) AND (topic_id <> 159934) AND (topic_id <> 168973)) Total runtime: 632.444 ms (6 rows) -- Why is it doing a sequential scan? :( My questions... is there a method for me to suggest which index to use in the query. I'm think adding logic in my script depending on which forum_id is used (since I can hard code in my scripts which are the popular forums) and tell the planner to use a specific index first? Secondly, why in the last output did it opt to do a sequential scan over using the forum_id index as it did earlier. Side note, a vacuum analayze was done just prior to running these tests. Thank you, -- Adam Alkins http://www.rasadam.com Mobile: 868-680-4612