Well, the reason it likes the first plan is that that has a smaller
estimated cost ;-). Basically this is a startup-time-versus-total-time
issue: the cost of the seqscan+limit is estimated to be about 1/8447'th
of the time to read the whole table, since it's estimating 8447
candidate matches and assuming that those are uniformly distributed in
the table. Meanwhile, the bitmap scan has a significant startup cost
because the entire indexscan is completed before we start to do any
fetching from the heap. The overestimate of the number of matching
rows contributes directly to overestimating the cost of the indexscan,
too. It ends up being a near thing --- 158 vs 166 cost units --- but
on the basis of these estimates the planner did the right thing.
So, what you need to do to make this better is to get it to have a
better idea of how many rows match the query condition; the overestimate
is both making the expensive plan look cheap, and making the cheaper
plan look expensive. Cranking up the statistics target for the
hts_code_id column (and re-ANALYZEing) ought to fix it. If all your
tables are this large you might want to just increase
default_statistics_target across the board.
regards, tom lane
Thanks for the great description of what's happening. Very informative. Upping the stats to the max 10000 (from the default 100) makes my example query use a faster plan, but not other problematic queries we have in the same vein (just add a few more values to the in clause). For ex. (this is with stats set to 10000 and re-analyzed).
=>explain analyze SELECT "exp_detls".id FROM "exp_detls" WHERE ("exp_detls"."hts_code_id" IN (8127,8377,8374,10744,11125,8375,8344,9127,9345)) LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..152.94 rows=1 width=4) (actual time=12057.637..12057.637 rows=0 loops=1)
-> Seq Scan on exp_detls (cost=0.00..1651399.83 rows=10798 width=4) (actual time=12057.635..12057.635 rows=0 loops=1)
Filter: (hts_code_id = ANY ('{8127,8377,8374,10744,11125,8375,8344,9127,9345}'::integer[]))
Total runtime: 12057.678 ms
From your description I think the planner is making two problematic assumptions that are leading to our issue:
First is that the potential matches are uniformly distributed across the physical table. While there are a number of reasons this may not be the case (correlated insertion order or update patterns etc.), in this case there's a very clear reason which is that the table is clustered on an index that leads with the column we're querying against ('hts_code_id') and nothing has been inserted or updated in the table since the last time we ran cluster on it (see the schema in the original e-mail).
Second is that is that it's assuming that the IN clause values are actually present in the table (and at some reasonably large frequency), in the worst cases (like above) they aren't present at all, forcing a full table scan. I don't know how the expected freq of values not present in the frequent values are estimated, but I'm guessing something based on the residual probability in the stats table and the est. number of distinct values? If so that assumes that the supplied values actually are in the set of distinct values which seems unjustified.
Perhaps there should be some estimation on whether the supplied value is one of the distinct values or not (could probably do something pretty statistically solid with a not too large bloom filter), or scaling by the width of the histogram bucket it occurs in (i.e. assuming an even distribution across the bucket). Though maybe in a lot of common use situations people only supply values that are known present so maybe this would make things worse more often then better (maybe limit 1 or better EXISTS would be a hint the value is not known present). Experimenting a bit it doesn't seem to matter which values are selected so it's not taking into account any kind of distribution over the histogram boundaries. For ex this query:
explain analyze SELECT "exp_detls".id FROM "exp_detls" WHERE ("exp_detls"."hts_code_id" IN (20000,20001,200002,20003,20004,20005,20006,20007,20008)) LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..152.94 rows=1 width=4) (actual time=12925.646..12925.646 rows=0 loops=1)
-> Seq Scan on exp_detls (cost=0.00..1651399.83 rows=10798 width=4) (actual time=12925.644..12925.644 rows=0 loops=1)
Filter: (hts_code_id = ANY ('{20000,20001,200002,20003,20004,20005,20006,20007,20008}'::integer[]))
Total runtime: 12925.677 ms
Has the same expected row count as the earlier query with the same number of not present IN values, but looking at the histogram boundaries these values are much, much less likely to find rows (assuming an even distribution across the bucket) then the values in the earlier query (histogram boundaries are something like 1,5,10,15,...11995,12000,80000,80005...) I can send the actual stats if interested (it's large though). Feels like this should factor into the estimates somehow.
Also, we occasionally run into surprises like this esp. as tables grow (fast query plans hit some planning tipping point and turn into very slow plans, or some new combination of values is requested) etc. Would be nice if there was some straightforward way to tweak the planner towards less risky queries, not necessarily worst case planning but maybe pick the plan based on expected cost + n% * 'worst case' cost for some reasonable estimation of worst case or the like (or using some probability distribution around the expected cost and picking the nth percentile time for ranking). Just thinking out loud.
Btw, thanks for all your work on what is a really great and useful tool!
Tim