Index filter instead of index condition w/ IN / ANY queries above certain set size

Danny Shemesh <dany74q@xxxxxxxxx> · Wed, 23 Nov 2022 10:49:25 +0200

Hey everyone,
I'm trying to understand when the planner decides to use an index condition vs an index filter, specifically for x IN / = ANY {set}, and if we can tune some parameters to move it between these plans.

We have two tables and a query similar to the following fiddle:
https://www.db-fiddle.com/f/g9tZvVk65RRg3XRskQZpXK/1

In the fiddle - we see that the planner uses an index condition up to a set of 3 elements, and fallbacks to use an index filter when the set has 4 or more elements;

In our case - which I couldn't easily replicate in the fiddle - the threshold is a single element - that is, a single element uses the index condition, and 2 or more elements use an index filter, and the latter is much slower on our data set.

This ^ also causes a side effect, where IN queries of a single element are 'flattened' to a single element comparison (x = y), but ANY queries aren't flattened, and are still being compared against a set of one element;
This flattening is what makes our IN queries use the index condition, but the = ANY(array[one-element]) to use the index filter.

I've tried playing w/ the random_page_cost, create extended statistics and tune other sys parameters - but it didn't nudge the planner the other way; 

Would appreciate if anyone could shed some light on this behavior (code refs are also welcomed), and if there's any way we could help the planner move between the plans.

Thanks a ton - appreciate your time !
Danny