Re: Perfomance of IN-clause with many elements and possible solutions

Jeff Janes <jeff.janes@xxxxxxxxx> · Mon, 24 Jul 2017 19:50:07 -0700

On Jul 24, 2017 14:19, "PT" <wmoran@xxxxxxxxxxxxxxxxx> wrote:
On Mon, 24 Jul 2017 13:17:56 +0300

Dmitry Lazurkin <dilaz03@xxxxxxxxx> wrote:

> On 07/24/2017 01:40 AM, PT wrote:

> > In this example you count approximately 40,000,000 values, which is

> > about 40% of the table.

>

> 4 000 000 (:

>

> > If you really need these queries to be faster, I would suggest

> > materializing the data, i.e. create a table like:

> >

> > CREATE TABLE id_counts (

> >  id BIGINT PRIMARY KEY,

> >  num BIGINT

> > )

> >

> > Then use a trigger or similar technique to keep id_counts in sync

> > with the id table. You can then run queries of the form:

> >

> > SELECT sum(num) FROM id_counts WHERE id IN :values:

> >

> > which I would wager houseboats will be significantly faster.

> I use count only for example because it uses seqscan. I want optimize

> IN-clause ;-).

The IN clause is not what's taking all the time. It's the processing of

millions of rows that's taking all the time.

It isn't either-or.  It is the processing of millions of rows over the large in-list which is taking the time. Processing an in-list as a hash table would be great, but no one has gotten around to it implementing it yet.  Maybe Dmitry will be the one to do that.

Cheers,

Jeff