Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?

Karl Denninger <karl@xxxxxxxxxxxxx> · Sat, 08 May 2010 15:35:19 -0500

I have a message posted in pgsql-general that outlines what I thought
was an indexing problem - it's not, so I'm bringing it here.

I dumped the table from our production system and stuffed it into a test
machine, then started refining and playing with the query until I was
able to get it to the "de-minimus" that showed the issue.  Note that the
actual query is frequently MUCH more complicated, but without the LIMIT
shown below the planner seems to do a decent job of figuring out how to
"get it done."

The actual table in question has ~2m rows totaling several gigabytes of
space.

Here's an abstract of the schema:

                                      Table "public.post"
  Column   |           Type           |                      
Modifiers                       
-----------+--------------------------+--------------------------------------------------------
subject   | text                     |
 message   | text                     |
 inserted  | timestamp with time zone |
 modified  | timestamp with time zone |
 replied   | timestamp with time zone |
 ordinal   | integer                  | not null default
nextval('post_ordinal_seq'::regclass)

Indexes:
    "post_pkey" PRIMARY KEY, btree (ordinal)
    "idx_message" gin (to_tsvector('english'::text, message))
    "idx_subject" gin (to_tsvector('english'::text, subject))

There's a bunch of other stuff in the table and many more indices, plus
foreign references, but stripping the table down to JUST THIS shows the
problem.

ticker=# explain analyze select * from post where to_tsvector('english',
message) @@ to_tsquery('violence') order by modified desc;
                                                            QUERY
PLAN                                                            
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=31795.16..31819.68 rows=9808 width=436) (actual
time=14.222..17.213 rows=3421 loops=1)
   Sort Key: modified
   Sort Method:  quicksort  Memory: 3358kB
   ->  Bitmap Heap Scan on post  (cost=1418.95..31144.90 rows=9808
width=436) (actual time=1.878..7.514 rows=3421 loops=1)
         Recheck Cond: (to_tsvector('english'::text, message) @@
to_tsquery('violence'::text))
         ->  Bitmap Index Scan on idx_message  (cost=0.00..1416.49
rows=9808 width=0) (actual time=1.334..1.334 rows=3434 loops=1)
               Index Cond: (to_tsvector('english'::text, message) @@
to_tsquery('violence'::text))
 Total runtime: 20.547 ms
(8 rows)

Ok, very nice.  20ms.  I like that.

Now lets limit the return to 100 items:

ticker=# explain analyze select * from post where to_tsvector('english',
message) @@ to_tsquery('violence') order by modified desc limit 100;

QUERY
PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..5348.69 rows=100 width=436) (actual
time=198.047..2607.077 rows=100 loops=1)
   ->  Index Scan Backward using post_modified on post 
(cost=0.00..524599.31 rows=9808 width=436) (actual
time=198.043..2606.864 rows=100 loops=1)
         Filter: (to_tsvector('english'::text, message) @@
to_tsquery('violence'::text))
 Total runtime: 2607.231 ms
(4 rows)

Bad.  Notice that the optimizer decided it was going to do an index scan
with an internal filter on it!  That's BACKWARD; what I want is for the
planner to first execute the index scan on the GIN index, then order the
return and limit the returned data set.

But it gets much worse - let's use something that's NOT in the message
base (the table in question has some ~2m rows by the way and consumes
several gigabytes on disk - anything that actually READS the table is
instant "bad news!")

ticker=# explain analyze select * from post where to_tsvector('english',
message) @@ to_tsquery('hosehead') order by modified;
                                                           QUERY
PLAN                                                          
--------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=31795.16..31819.68 rows=9808 width=436) (actual
time=0.407..0.407 rows=0 loops=1)
   Sort Key: modified
   Sort Method:  quicksort  Memory: 25kB
   ->  Bitmap Heap Scan on post  (cost=1418.95..31144.90 rows=9808
width=436) (actual time=0.402..0.402 rows=0 loops=1)
         Recheck Cond: (to_tsvector('english'::text, message) @@
to_tsquery('hosehead'::text))
         ->  Bitmap Index Scan on idx_message  (cost=0.00..1416.49
rows=9808 width=0) (actual time=0.399..0.399 rows=0 loops=1)
               Index Cond: (to_tsvector('english'::text, message) @@
to_tsquery('hosehead'::text))
 Total runtime: 0.441 ms
(8 rows)

Very fast, as you'd expect - it returned nothing.  Now let's try it with
a "LIMIT":

ticker=# explain analyze select * from post where to_tsvector('english',
message) @@ to_tsquery('hosehead') order by modified limit 100;
NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
                                                                  QUERY
PLAN                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..5348.69 rows=100 width=436) (actual
time=254217.850..254217.850 rows=0 loops=1)
   ->  Index Scan using post_modified on post  (cost=0.00..524599.31
rows=9808 width=436) (actual time=254217.847..254217.847 rows=0 loops=1)
         Filter: (to_tsvector('english'::text, message) @@
to_tsquery('hosehead'::text))
 Total runtime: 254217.891 ms
(4 rows)

ticker=#

Oh crap.  It actually went through and looked at the entire freaking
table - one message at a time.

An attempt to re-write the query into something that FORCES the planner
to do the right thing fails too.  For example:

ticker=# explain analyze select * from post where ordinal in (select
ordinal from post where to_tsvector('english', message) @@
to_tsquery('hosehead')) order by modified;         
                                                                 QUERY
PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=94886.44..94910.96 rows=9808 width=436) (actual
time=0.884..0.884 rows=0 loops=1)
   Sort Key: public.post.modified
   Sort Method:  quicksort  Memory: 25kB
   ->  Nested Loop  (cost=31173.42..94236.19 rows=9808 width=436)
(actual time=0.879..0.879 rows=0 loops=1)
         ->  HashAggregate  (cost=31173.42..31271.50 rows=9808 width=4)
(actual time=0.877..0.877 rows=0 loops=1)
               ->  Bitmap Heap Scan on post  (cost=1422.95..31148.90
rows=9808 width=4) (actual time=0.850..0.850 rows=0 loops=1)
                     Recheck Cond: (to_tsvector('english'::text,
message) @@ to_tsquery('hosehead'::text))
                     ->  Bitmap Index Scan on idx_message 
(cost=0.00..1420.50 rows=9808 width=0) (actual time=0.848..0.848 rows=0
loops=1)
                           Index Cond: (to_tsvector('english'::text,
message) @@ to_tsquery('hosehead'::text))
         ->  Index Scan using post_ordinal on post  (cost=0.00..6.41
rows=1 width=436) (never executed)
               Index Cond: (public.post.ordinal = public.post.ordinal)
 Total runtime: 0.985 ms
(12 rows)

Fast, if convoluted.

ticker=# explain analyze select * from post where ordinal in (select
ordinal from post where to_tsvector('english', message) @@
to_tsquery('hosehead')) order by modified limit 100;
NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.

QUERY
PLAN                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..19892.88 rows=100 width=436) (actual
time=270563.091..270563.091 rows=0 loops=1)
   ->  Nested Loop Semi Join  (cost=0.00..1951093.77 rows=9808
width=436) (actual time=270563.088..270563.088 rows=0 loops=1)
         ->  Index Scan using post_modified on post 
(cost=0.00..509887.63 rows=1961557 width=436) (actual
time=0.015..3427.627 rows=1953674 loops=1)
         ->  Index Scan using post_ordinal on post  (cost=0.00..0.73
rows=1 width=4) (actual time=0.134..0.134 rows=0 loops=1953674)
               Index Cond: (public.post.ordinal = public.post.ordinal)
               Filter: (to_tsvector('english'::text,
public.post.message) @@ to_tsquery('hosehead'::text))
 Total runtime: 270563.147 ms
(7 rows)

ticker=#

Ok, that didn't work either.

Interestingly enough, if I crank up the limit to 500, it starts behaving!

ticker=# explain analyze select * from post where ordinal in (select
ordinal from post where to_tsvector('english', message) @@
to_tsquery('hosehead')) order by modified limit 500;

QUERY
PLAN                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=94724.91..94726.16 rows=500 width=436) (actual
time=1.475..1.475 rows=0 loops=1)
   ->  Sort  (cost=94724.91..94749.43 rows=9808 width=436) (actual
time=1.473..1.473 rows=0 loops=1)
         Sort Key: public.post.modified
         Sort Method:  quicksort  Memory: 25kB
         ->  Nested Loop  (cost=31173.43..94236.19 rows=9808 width=436)
(actual time=1.468..1.468 rows=0 loops=1)
               ->  HashAggregate  (cost=31173.43..31271.51 rows=9808
width=4) (actual time=1.466..1.466 rows=0 loops=1)
                     ->  Bitmap Heap Scan on post 
(cost=1422.95..31148.91 rows=9808 width=4) (actual time=1.440..1.440
rows=0 loops=1)
                           Recheck Cond: (to_tsvector('english'::text,
message) @@ to_tsquery('hosehead'::text))
                           ->  Bitmap Index Scan on idx_message 
(cost=0.00..1420.50 rows=9808 width=0) (actual time=1.438..1.438 rows=0
loops=1)
                                 Index Cond:
(to_tsvector('english'::text, message) @@ to_tsquery('hosehead'::text))
               ->  Index Scan using post_ordinal on post 
(cost=0.00..6.41 rows=1 width=436) (never executed)
                     Index Cond: (public.post.ordinal = public.post.ordinal)
 Total runtime: 1.600 ms
(13 rows)

Why is the planner "taking into consideration" the LIMIT (I know the
docs say it does) and choosing to sequentially scan a table of nearly 2
million rows?!  I don't see how that makes sense.... irrespective of the
query being LIMITed.

If it matters setting enable_seqscan OFF does not impact the results.

-- Karl
begin:vcard
fn:Karl Denninger
n:Denninger;Karl
email;internet:karl@xxxxxxxxxxxxx
x-mozilla-html:TRUE
version:2.1
end:vcard

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance