Re: Index filter instead of index condition w/ IN / ANY queries above certain set size

Danny Shemesh <dany74q@xxxxxxxxx> · Wed, 23 Nov 2022 21:23:49 +0200

Hey Laurenz, Tom - thanks again !
> that it is cheaper to use the index that supports the ORDER BY
Thing is, that both queries use the exact same index (idx_hashes), but one uses it w/ the filter and one does not.

> This doesn't match up terribly well with the table definition you showed before
Yeah.. it was a bit hard to reproduce exactly, but the fiddle does showcase that there's some threshold to the ANY set-size
where it stops using the column in the index condition, and moves it to the filter step - I thought it might originate
from similar reasons.

> but I wonder whether tidh is a low-order index column.
The index indeed uses tidh as a low order column, and it's better to have it the other way around -
currently, it's: (tid, pidh, tidh) - where (tid, tidh, pidh) would've probably worked better.

We've already optimized the query itself - but for pure understanding of the planner decision here,
I'd really still like to understand, if possible, the difference between ANY and IN,
and why, even though the column order isn't optimal - one plan still successfully uses the index more efficiently than another. 

Any idea where I could zone-in in the source code to look for hints, maybe ?

Appreciate it !
Danny

On Wed, Nov 23, 2022 at 4:29 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Danny Shemesh <dany74q@xxxxxxxxx> writes:

>                                         ->  Index Only Scan using

> idx_hashes on refs  (cost=0.56..722735.47 rows=33715 width=16) (actual

> time=1727.208..1727.208 rows=1 loops=1)

>                                               Index Cond: (tid =

> '13371337-1337-1337-1337-133713371337'::uuid)

> *                                              Filter: (tidh = ANY

> ('{13391339-1339-1339-1339-133913391339}'::uuid[]))    <<<<<<<<<<<<<<<-

> Note this line*                                              Rows Removed

> by Filter: 109087

>                                               Heap Fetches: 16976

>                                               Buffers: shared hit=13051

> read=14561

>                                               I/O Timings: read=53405.294

This doesn't match up terribly well with the table definition

you showed before, but I wonder whether tidh is a low-order

index column.  If you need to optimize this specific shape

of query you need to pay attention to the index column order, per

https://www.postgresql.org/docs/current/indexes-multicolumn.html

That is, tid and tidh need to be the first two index columns.

                        regards, tom lane