Re: limit clause breaks query planner?

Guillaume Cottenceau <gc@xxxxxx> · Tue, 02 Sep 2008 11:06:48 +0200

Russell Smith <mr-russ 'at' pws.com.au> writes:

> Pavel Stehule wrote:
>> Hello
>>
>> 2008/9/1 David West <david.west@xxxxxxxxxxxxx>:
>>   
>>> Thanks for your suggestion but the result is the same.
>>>
>>> Here is the explain analyse output from different queries.
>>> Select * from my_table where A is null and B = '21' limit 15
>>>
>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
>>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>> "Total runtime: 85896.214 ms"
>>>
>>>     
> [snip]
>
> Further to Pavel's comments;
>
> (actual time=85837.038..85896.091 rows=15 loops=1)
>
> That's 85 seconds on a sequence scan to return the first tuple.  The table is not bloated by any chance is it?

Wouldn't this be e.g. normal if the distribution of values would
be uneven, e.g. A IS NULL AND B = '21' not near the beginning of
the table data?

By the way, my newbie eyes on "pg_stats" seem to tell me that PG
doesn't collect/use statistics about the distribution of the
data, am I wrong? E.g. in that situation, when a few A IS NULL
AND B = '21' rows move from the beginning to the end of the table
data, a seqscan becomes a totally different story.. (the
correlation changes, but may not change a lot if only a few rows
move).

However, I cannot reproduce a similar situation to David's.

gc=# create table foo ( bar int, baz text );
CREATE TABLE
gc=# insert into foo ( select generate_series(0, 10000000) / 1000000, case when random() < 0.05 then 'Today Alcatel-Lucent has announced that P******* C**** is appointed non-executive Chairman and B** V******** is appointed Chief Executive Officer.' else null end );
INSERT 0 10000001
gc=# create index foobar on foo(bar);
CREATE INDEX
gc=# create index foobaz on foo(baz);
CREATE INDEX
gc=# explain select * from foo where baz is null and bar = '8';
                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=1297.96..1783.17 rows=250 width=36)
   Recheck Cond: ((bar = 8) AND (baz IS NULL))
   ->  BitmapAnd  (cost=1297.96..1297.96 rows=250 width=0)
         ->  Bitmap Index Scan on foobar  (cost=0.00..595.69 rows=50000 width=0)
               Index Cond: (bar = 8)
         ->  Bitmap Index Scan on foobaz  (cost=0.00..701.90 rows=50000 width=0)
               Index Cond: (baz IS NULL)
(7 rows)

gc=# analyze foo;
ANALYZE
gc=# explain select * from foo where baz is null and bar = '8';
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Index Scan using foobar on foo  (cost=0.00..30398.66 rows=1079089 width=154)
   Index Cond: (bar = 8)
   Filter: (baz IS NULL)
(3 rows)

This is using pg 8.3.1 and:

random_page_cost = 2
effective_cache_size = 256MB
shared_buffers = 384MB

David, is there relevant information you've forgot to tell:

- any other columns in your table?
- is table bloated?
- has table never been analyzed?
- what version of postgresql? what overriden configuration?

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36