Hi, I have a single table with about 10 million rows, and two
indexes. Index A is on a column A with 95% null values. Index B is
on a column B with about 10 values, ie. About a million rows of each value. When I do a simple query on the table (no joins) with the
following condition: A is null AND B = ‘21’ it uses the correct index, index B. However, when I
add a limit clause of 15, postgres decides to do a sequential scan :s.
Looking at the results from explain: "Limit (cost=0.00..3.69 rows=15 width=128)" " -> Seq Scan on my_table this_
(cost=0.00..252424.24 rows=1025157 width=128)" " Filter: ((A
IS NULL) AND ((B)::text = '21'::text))" It appears that postgres is (very incorrectly) assuming that
it will only have to retrieve 15 rows on a sequential scan, and gives a total
cost of 3.69. In reality, it has to scan many rows forward until it finds
the correct value, yielding very poor performance for my table. If I disable sequential scan (set enable_seqscan=false) it then
incorrectly uses the index A that has 95% null values: it seems to incorrectly
apply the same logic again that it will only have to retrieve 15 rows with the
limit clause, and thinks that the index scan using A is faster than index scan
B. Only by deleting the index on A and disabling sequential
scan will it use the correct index, which is of course by far the fastest. Is there an assumption in the planner that a limit of 15
will mean that postgres will only have to read 15 rows? If so is this a
bad assumption? If a particular query is faster without a limit, then
surely it will also be faster with the limit. Any workarounds for this? Thanks David |