Re: limit clause breaks query planner?

"Matt Smiley" <mss@xxxxxxxxxxx> · Wed, 03 Sep 2008 23:10:34 -0700

"Tom Lane" <tgl@xxxxxxxxxxxxx> writes:
>  "Matt Smiley" <mss@xxxxxxxxxxx> writes:
>  >  So an Index Scan is always going to have a higher cost estimate than
>  >  an equivalent Seq Scan returning the same result rows (unless
>  >  random_page_cost is < 1).  That's why I think the planner is always
>  >  preferring the plan that uses a Seq Scan.
>  
>  If that were the case, we'd never choose an indexscan at all...

You're right, that was a silly guess.

>  It's true that a plain indexscan is not preferred for queries that will
>  return a large fraction of the table.  However, it should be willing to
>  use a bitmap scan for this query, given default cost settings (the
>  default cost settings will cause it to prefer bitmap scan for retrieving
>  up to about a third of the table, in my experience).  I too am confused
>  about why it doesn't prefer that choice in the OP's example.

It looks like the bitmap scan has a higher cost estimate because the entire bitmap index must be built before beginning the heap scan and returning rows up the pipeline.  The row-count limit can't be pushed lower than the bitmap-heap-scan like it can for the basic index-scan.

test_8_3_3=# set enable_seqscan = false ;
SET

test_8_3_3=# set enable_indexscan = false ;
SET

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=17070.22..17071.02 rows=15 width=8) (actual time=606.902..607.086 rows=15 loops=1)
   ->  Bitmap Heap Scan on my_table  (cost=17070.22..69478.96 rows=988217 width=8) (actual time=606.892..606.983 rows=15 loops=1)
         Recheck Cond: (b = 3)
         Filter: (a IS NULL)
         ->  Bitmap Index Scan on idx_b  (cost=0.00..16823.17 rows=1033339 width=0) (actual time=592.657..592.657 rows=1000000 loops=1)
               Index Cond: (b = 3)
 Total runtime: 607.340 ms
(7 rows)

>  It would be interesting to alter the random_page_cost setting and see if he gets
>  different results.

Using an unmodified postgresql.conf, the cost estimate for an index-scan were so much higher than for a seqscan that random_page_cost had to be set below 0.2 before the index-scan was preferred.  However, it looks like this was mainly because effective_cache_size was too small.  The planner thought the cache was only 128 MB, and the size of the complete table+index was 39492 + 21946 pages * 8 KB/block = 330 MB.  It makes sense for the cost estimate to be so much higher if blocks are expected to be repeatedly re-fetched from disk.  I wonder if David's effective_cache_size is too small.

test_8_3_3=# reset all ;
RESET

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.50 rows=15 width=8) (actual time=0.036..0.239 rows=15 loops=1)
   ->  Seq Scan on my_table  (cost=0.00..164492.74 rows=988217 width=8) (actual time=0.028..0.138 rows=15 loops=1)
         Filter: ((a IS NULL) AND (b = 3))
 Total runtime: 0.338 ms
(4 rows)

test_8_3_3=# set enable_seqscan = false ;
SET

test_8_3_3=# show random_page_cost ;
 random_page_cost
------------------
 4
(1 row)

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..45.99 rows=15 width=8) (actual time=0.051..0.200 rows=15 loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..3029924.36 rows=988217 width=8) (actual time=0.043..0.100 rows=15 loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.308 ms
(5 rows)

test_8_3_3=# set random_page_cost = 0.19 ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.45 rows=15 width=8) (actual time=0.050..0.201 rows=15 loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..161190.65 rows=988217 width=8) (actual time=0.042..0.097 rows=15 loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.307 ms
(5 rows)

Now fix effective_cache_size and try again.

test_8_3_3=# reset all ;
RESET

test_8_3_3=# set effective_cache_size = '500MB' ;
SET
test_8_3_3=# set enable_seqscan = false ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.78 rows=15 width=8) (actual time=0.051..0.204 rows=15 loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..183361.21 rows=988217 width=8) (actual time=0.043..0.103 rows=15 loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.311 ms
(5 rows)

That's better, but still not quite low enough cost estimate to beat the seqscan.  Try adjusting random_page_cost again.

test_8_3_3=# set random_page_cost = 3 ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.16 rows=15 width=8) (actual time=0.052..0.202 rows=15 loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..142053.51 rows=988217 width=8) (actual time=0.043..0.100 rows=15 loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.311 ms
(5 rows)

That's enough: index-scan's 142053.51 beats seqscan's 164492.74.  We no longer need to set enable_seqscan=false.