Yes I inserted values in big batches according to a single value of 'B', so indeed a sequence scan may have to scan forward many millions of rows before finding the required value. I have been doing regular analyse commands on my table. I don't think my table is bloated, I haven't been performing updates. However I'm doing a vacuum analyse now and I'll see if that makes any difference. I am using postgres 8.3.1 with a default install on windows - no tweaks to the configuration at all. There are many other columns in my table, but none of them are used in this query. Guillaume in your example you didn't add the limit clause? Postgres chooses the correct index in my case without the limit clause, the problem is with the limit clause. One other difference with your example is both my columns are varchar columns, not integer and text, I don't know if that would make a difference. From looking at the plans, it seems to be postgres is assuming it will only have to sequentially scan 15 rows, which is not true in my case because column B is not distributed randomly (nor will it be in production). Would postgres not be best to ignore the limit when deciding the best index to use - in this simple query wouldn't the best plan to use always be the same with or without a limit? Thanks to all of you for your interest in my problem David -----Original Message----- From: Guillaume Cottenceau [mailto:gc@xxxxxx] Sent: 02 September 2008 10:07 To: David West; pgsql-performance@xxxxxxxxxxxxxx Subject: Re: [PERFORM] limit clause breaks query planner? Wouldn't this be e.g. normal if the distribution of values would be uneven, e.g. A IS NULL AND B = '21' not near the beginning of the table data? By the way, my newbie eyes on "pg_stats" seem to tell me that PG doesn't collect/use statistics about the distribution of the data, am I wrong? E.g. in that situation, when a few A IS NULL AND B = '21' rows move from the beginning to the end of the table data, a seqscan becomes a totally different story.. (the correlation changes, but may not change a lot if only a few rows move). However, I cannot reproduce a similar situation to David's. gc=# create table foo ( bar int, baz text ); CREATE TABLE gc=# insert into foo ( select generate_series(0, 10000000) / 1000000, case when random() < 0.05 then 'Today Alcatel-Lucent has announced that P******* C**** is appointed non-executive Chairman and B** V******** is appointed Chief Executive Officer.' else null end ); INSERT 0 10000001 gc=# create index foobar on foo(bar); CREATE INDEX gc=# create index foobaz on foo(baz); CREATE INDEX gc=# explain select * from foo where baz is null and bar = '8'; QUERY PLAN ---------------------------------------------------------------------------- ----- Bitmap Heap Scan on foo (cost=1297.96..1783.17 rows=250 width=36) Recheck Cond: ((bar = 8) AND (baz IS NULL)) -> BitmapAnd (cost=1297.96..1297.96 rows=250 width=0) -> Bitmap Index Scan on foobar (cost=0.00..595.69 rows=50000 width=0) Index Cond: (bar = 8) -> Bitmap Index Scan on foobaz (cost=0.00..701.90 rows=50000 width=0) Index Cond: (baz IS NULL) (7 rows) gc=# analyze foo; ANALYZE gc=# explain select * from foo where baz is null and bar = '8'; QUERY PLAN ---------------------------------------------------------------------------- -- Index Scan using foobar on foo (cost=0.00..30398.66 rows=1079089 width=154) Index Cond: (bar = 8) Filter: (baz IS NULL) (3 rows) This is using pg 8.3.1 and: random_page_cost = 2 effective_cache_size = 256MB shared_buffers = 384MB David, is there relevant information you've forgot to tell: - any other columns in your table? - is table bloated? - has table never been analyzed? - what version of postgresql? what overriden configuration? -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36