Re: limit clause breaks query planner?

Gregory Stark <stark@xxxxxxxxxxxxxxxx> · Tue, 02 Sep 2008 10:44:54 +0100

>>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
>>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
>>>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>>> "Total runtime: 85896.214 ms"

Postgres does collect and use statistics about what fraction of the "A" column
is null. It also collects and uses statistics about what fraction of the "B"
column is 21 (using a histogram). And it does take the LIMIT into account.

I think the other poster might well be right about this table being extremely
bloated. You could test by running and posting the results of:

VACUUM VERBOSE my_table

What it doesn't collect is where in the table those records are -- so if there
are a lot of them then it might use a sequential scan regardless of whether
they're at the beginning or end of the table. That seems unlikely to be the
problem though.

The other thing it doesn't collect is how many of the B=21 records have null
As. So if a large percentage of the table has A as null then it will assume
that's true for the B=21 records and if there are a lot of B=21 records then
it will assume a sequential scan will find matches quickly. If in fact the two
columns are highly correlated and B=21 records almost never have A null
whereas records with other values of B have lots of null values then Postgres
might make a bad decision here.

Also, it only has the statitics for B=21 via a histogram. If the distribution
of B is highly skewed so that, for example values between 20 and 25 are very
common but B=21 happens to be quite rare then Postgres might get a bad
estimate here. You could improve this by raising the statistics target for the
B column and re-analyzing.

That brings up another question -- when was the last time this table was
analyzed?

What estimates and actual results does postgres get for simple queries like:

EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE B=21;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL AND B=21;

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!