Index Choice Problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi List,

I would like some insight from the experts here as to how I can alter
which index PostgreSQL is choosing to run a query.

First off, I'm running an active web forum (phpBB) with sometimes
hundreds of concurrent users. The query in question is one which pulls
the lists of topics in the forum. The table in question is here:

--

forums=> \d phpbb_topics;
                                     Table "public.phpbb_topics"
        Column        |         Type          |                      
Modifiers
----------------------+-----------------------+-------------------------------------------------------
 topic_id             | integer               | not null default
nextval('phpbb_topics_id_seq'::text)
 forum_id             | integer               | not null default 0
 topic_title          | character varying(60) | not null default
''::character varying
 topic_poster         | integer               | not null default 0
 topic_time           | integer               | not null default 0
 topic_views          | integer               | not null default 0
 topic_replies        | integer               | not null default 0
 topic_status         | smallint              | not null default (0)::smallint
 topic_vote           | smallint              | not null default (0)::smallint
 topic_type           | smallint              | not null default (0)::smallint
 topic_first_post_id  | integer               | not null default 0
 topic_last_post_id   | integer               | not null default 0
 topic_moved_id       | integer               | not null default 0
 topic_last_post_time | integer               | not null default 0
Indexes:
    "forum_id_phpbb_topics_index" btree (forum_id)
    "topic_id_phpbb_topics_index" btree (topic_id)
    "topic_last_post_id_phpbb_topics_index" btree (topic_last_post_id)
    "topic_last_post_time_phpbb_topics_index" btree (topic_last_post_time)
    "topic_moved_id_phpbb_topics_index" btree (topic_moved_id)

--

To layout the contents of the table, here are some relevant queries
showing the number of entries

forums=# SELECT COUNT(*) FROM phpbb_topics; SELECT COUNT(*) FROM
phpbb_topics WHERE forum_id = 71; SELECT COUNT(*) FROM phpbb_topics
WHERE forum_id = 55;
 count
--------
 190588
(1 row)

 count
-------
  1013
(1 row)

 count
-------
 35035
(1 row)

--

Ok. Now, here's the problem. I run a query to pull the list of topics
for the forum. There pagination, so the first page query looks like
this:

SELECT t.topic_id
			FROM phpbb_topics AS t
				WHERE t.forum_id = 71
					AND t.topic_id NOT IN (205026, 29046, 144569, 59780, 187424,
138635, 184973, 170551, 22419, 181690, 197254, 205130)
						ORDER BY t.topic_last_post_time DESC
							LIMIT 23 OFFSET 0

                                                                      
                                                                      
                                                                      
                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3487.78..3487.87 rows=34 width=8) (actual
time=1112.921..1113.005 rows=34 loops=1)
   ->  Sort  (cost=3486.15..3489.10 rows=1181 width=8) (actual
time=1112.087..1112.535 rows=687 loops=1)
         Sort Key: topic_last_post_time
         ->  Index Scan using forum_id_phpbb_topics_index on
phpbb_topics t  (cost=0.00..3425.89 rows=1181 width=8) (actual
time=54.650..1109.877 rows=1012 loops=1)
               Index Cond: (forum_id = 71)
               Filter: (topic_id <> 205026)
 Total runtime: 1113.268 ms
(7 rows)

--

This is the query on one of the lesser active forums (forum_id = 71)
which as list earlier only has 1013 rows. This query slow because
PostgreSQL is not using the index on the "forum_id" column, but
instead scanning through the topics via the topic_last_post_time and
filtering through the posts. This would be good for the forum_id = 55
where the most recent topics would be quickly found. Now here's the
stranger part, going deeper into the results (ie selecting pages
further down), the planner does this:

--

SELECT t.topic_id
			FROM phpbb_topics AS t
				WHERE t.forum_id = 71
					AND t.topic_id NOT IN (205026)
						ORDER BY t.topic_last_post_time DESC
							LIMIT 34 OFFSET 653

                                                                      
    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3487.78..3487.87 rows=34 width=8) (actual
time=6.140..6.202 rows=34 loops=1)
   ->  Sort  (cost=3486.15..3489.10 rows=1181 width=8) (actual
time=5.306..5.753 rows=687 loops=1)
         Sort Key: topic_last_post_time
         ->  Index Scan using forum_id_phpbb_topics_index on
phpbb_topics t  (cost=0.00..3425.89 rows=1181 width=8) (actual
time=0.070..3.581 rows=1012 loops=1)
               Index Cond: (forum_id = 71)
               Filter: (topic_id <> 205026)
 Total runtime: 6.343 ms
(7 rows)

--

This is more like how it should be done IMO. Results are much faster
when the forum id index is used. Now, the output of the first query on
the forum_id = 55 looks like this

--

SELECT t.topic_id
			FROM phpbb_topics AS t
				WHERE t.forum_id = 55
					AND t.topic_id NOT IN (159934, 168973, 79609, 179029, 61593,
184896, 190572)
						ORDER BY t.topic_last_post_time DESC
							LIMIT 28 OFFSET 0

                                                                      
                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..50.50 rows=28 width=8) (actual time=0.060..0.714
rows=28 loops=1)
   ->  Index Scan Backward using
topic_last_post_time_phpbb_topics_index on phpbb_topics t 
(cost=0.00..63232.38 rows=35063 width=8) (actual time=0.057..0.675
rows=28 loops=1)
         Filter: ((forum_id = 55) AND (topic_id <> 159934) AND
(topic_id <> 168973) AND (topic_id <> 79609) AND (topic_id <> 179029)
AND (topic_id <> 61593) AND (topic_id <> 184896) AND (topic_id <>
190572))
 Total runtime: 0.794 ms

--

This is acceptable usage when the forum_id is heavily populated. Next
now, here again puzzles me, pulling entries in the middle of forum_id
= 55

--

SELECT t.topic_id
			FROM phpbb_topics AS t
				WHERE t.forum_id = 55
					AND t.topic_id NOT IN (159934, 168973)
						ORDER BY t.topic_last_post_time DESC
							LIMIT 33 OFFSET 17458

                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=29302.43..29302.51 rows=33 width=8) (actual
time=625.907..625.969 rows=33 loops=1)
   ->  Sort  (cost=29258.78..29346.44 rows=35064 width=8) (actual
time=603.710..615.411 rows=17491 loops=1)
         Sort Key: topic_last_post_time
         ->  Seq Scan on phpbb_topics t  (cost=0.00..26611.85
rows=35064 width=8) (actual time=0.067..528.271 rows=35034 loops=1)
               Filter: ((forum_id = 55) AND (topic_id <> 159934) AND
(topic_id <> 168973))
 Total runtime: 632.444 ms
(6 rows)

--

Why is it doing a sequential scan? :(

My questions... is there a method for me to suggest which index to use
in the query. I'm think adding logic in my script depending on which
forum_id is used (since I can hard code in my scripts which are the
popular forums) and tell the planner to use a specific index first?

Secondly, why in the last output did it opt to do a sequential scan
over using the forum_id index as it did earlier.

Side note, a vacuum analayze was done just prior to running these tests.

Thank you,
--
Adam Alkins
http://www.rasadam.com
Mobile: 868-680-4612


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux