"David West" <david.west 'at' cusppoint.com> writes: > Yes I inserted values in big batches according to a single value of 'B', so > indeed a sequence scan may have to scan forward many millions of rows before > finding the required value. That may well be why the seqscan is so slow to give your results; that said, it doesn't explain why the indexscane is not preferred. > I have been doing regular analyse commands on my table. I don't think my Like, recently? Can you post the stats? gc=# select * from pg_stats where tablename = 'foo'; You should try to ANALYZE again and see if that makes a difference, to be sure. > table is bloated, I haven't been performing updates. However I'm doing a Maybe you've been DELETE'ing then INSERT'ing some? That creates bloat too. Btw, don't forget to prefer TRUNCATE to remove everything from the table, and ANALYZE after large INSERT's. > vacuum analyse now and I'll see if that makes any difference. A single VACUUM may not report how bloated your table is, if it's been VACUUM'ed some before, but not frequently enough. If you have time for it, and you can afford a full lock on the table, only a VACUUM FULL VERBOSE will tell you the previous bloat (the "table .. truncated to .." line IIRC). > I am using postgres 8.3.1 with a default install on windows - no tweaks to > the configuration at all. With a default install, the free space map settings may well be too small for tracking free space on a table as large as 10M rows. Performing VACUUM VERBOSE on database 'template1' will show you interesting information about the current and ideal FSM settings, at the end of the output. Something like: INFO: free space map contains 37709 pages in 276 relations DETAIL: A total of 42080 page slots are in use (including overhead). 42080 page slots are required to track all free space. Current limits are: 204800 page slots, 1000 relations, using 1265 kB. Of course, this also depends on the frequency of your VACUUMing (if autovacuuming is not configured or badly configured) against the frequency of your UPDATE's and DELETE's. > There are many other columns in my table, but none of them are used in this > query. Can you show us the table definition? I am too ignorant in PG to know if that would make a difference, but it might ring a bell for others.. AFAIK, more column data may mean larger resultsets and may change the plan (though "width=128" in the log of your explanation wouldn't mean a lot of data are stored per row). > Guillaume in your example you didn't add the limit clause? Postgres chooses > the correct index in my case without the limit clause, the problem is with > the limit clause. Duh, forgot about that, sorry! But I did try it and it was the same. gc=# explain select * from foo where baz is null and bar = '8' limit 15; QUERY PLAN ------------------------------------------------------------------------------------ Limit (cost=0.00..0.42 rows=15 width=154) -> Index Scan using foobar on foo (cost=0.00..30398.66 rows=1079089 width=154) Index Cond: (bar = 8) Filter: (baz IS NULL) (4 rows) > One other difference with your example is both my columns are > varchar columns, not integer and text, I don't know if that > would make a difference. It is always useful to know as much about the actual table definition and data, to isolate a performance problem... I know it may clash with privacy :/ but that kind of information probably will not, isn't it? With: gc=# create table foo ( bar varchar(64), baz varchar(256) ); it doesn't make a difference yet: gc=# explain select * from foo where baz is null and bar = '8'; QUERY PLAN ----------------------------------------------------------------------------- Index Scan using foobar on foo (cost=0.00..27450.05 rows=982092 width=149) Index Cond: ((bar)::text = '8'::text) Filter: (baz IS NULL) (3 rows) gc=# explain select * from foo where baz is null and bar = '8' limit 15; QUERY PLAN ----------------------------------------------------------------------------------- Limit (cost=0.00..0.42 rows=15 width=149) -> Index Scan using foobar on foo (cost=0.00..27450.05 rows=982092 width=149) Index Cond: ((bar)::text = '8'::text) Filter: (baz IS NULL) (4 rows) Btw, it would help if you could reproduce my test scenario and see if PG uses "correctly" the indexscan. It is better to try on your installation, to take care of any configuration/whatever variation which may create your problem. >>From looking at the plans, it seems to be postgres is assuming it will only > have to sequentially scan 15 rows, which is not true in my case because > column B is not distributed randomly (nor will it be in production). Would Why do you say that? The explanation seems to rather tell that it (correctly) assumes that the seqscan would bring up about 1M rows for the selected values of A and B, and then it will limit to 15 rows. > postgres not be best to ignore the limit when deciding the best index to use > - in this simple query wouldn't the best plan to use always be the same > with or without a limit? I am not too sure, but I'd say no: when PG considers the LIMIT, then it knows that (potentially) less rows are to be actually used from the inner resultset, so a different plan may be devised. -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36