Re: Trying to understand why a query is filtering when there is a composite index

Tom Lane <tgl@xxxxxxxxxxxxx> · Sun, 18 Aug 2024 22:50:10 -0400

"Stephen Samuel (Sam)" <sam@xxxxxxxxxxxx> writes:
> There is a unique index on (a,b)
> The query is:

> SELECT b
> FROM table
> WHERE a = <id>
>   AND b IN (<ids>)

> The planner says index only scan, but is filtering on b.

> Index Only Scan using pkey on table  (cost=0.46..29.09 rows=1
> width=19) (actual time=0.033..0.053 rows=10 loops=1)
>   Index Cond: (a = 662028765)
> "  Filter: (b = ANY
> ('{634579987:662028765,561730945:662028765,505555183:662028765,472806302:662028765,401361055:662028765,363587258:662028765,346093772:662028765,314369897:662028765,289498328:662028765,217993946:662028765}'::text[]))"
>   Rows Removed by Filter: 1
>   Heap Fetches: 11
> Planning Time: 0.095 ms
> Execution Time: 0.070 ms

I think it's a good bet that this query would be *slower* if
it were done the other way.  The filter condition is eliminating
only one of the 11 rows matching "a = 662028765".  If we did what
you think you want, we'd initiate ten separate index descents
to find the other ten rows.

Whether the planner is costing this out accurately enough to
realize that, or whether it's just accidentally falling into
the right plan, I'm not sure; you've not provided nearly
enough details for anyone to guess what the other cost estimate
was.

> And why is it doing heap lookups for every row,.

Yeah, that part is a weakness I've wanted to fix for a long
time: it could do the filter condition by fetching b from the
index, but it doesn't notice that and has to go to the heap
to get b.  (If the other plan does win, it'd likely be because
of that problem and not because the index scanning strategy
per se is better.)

			regards, tom lane