Re: Perfomance of IN-clause with many elements and possible solutions

PT <wmoran@xxxxxxxxxxxxxxxxx> · Mon, 24 Jul 2017 17:17:59 -0400

On Mon, 24 Jul 2017 13:17:56 +0300
Dmitry Lazurkin <dilaz03@xxxxxxxxx> wrote:

> On 07/24/2017 01:40 AM, PT wrote:
> > In this example you count approximately 40,000,000 values, which is
> > about 40% of the table. 
>
> 4 000 000 (:
>
> > If you really need these queries to be faster, I would suggest
> > materializing the data, i.e. create a table like:
> >
> > CREATE TABLE id_counts (
> >  id BIGINT PRIMARY KEY,
> >  num BIGINT
> > )
> >
> > Then use a trigger or similar technique to keep id_counts in sync
> > with the id table. You can then run queries of the form:
> >
> > SELECT sum(num) FROM id_counts WHERE id IN :values:
> >
> > which I would wager houseboats will be significantly faster.
> I use count only for example because it uses seqscan. I want optimize
> IN-clause ;-).

The IN clause is not what's taking all the time. It's the processing of
millions of rows that's taking all the time.

Perhaps you should better describe what it is you really want to accomplish.
Regardless of what it is, if it involves processing many millions of rows,
you're probably going to need to do some sort of materialization.

-- 
PT <wmoran@xxxxxxxxxxxxxxxxx>

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general