Re: Have I b0rked something? Slow comparisons on "where x in (...)"

"Dann Corbit" <DCorbit@xxxxxxxxx> · Wed, 2 May 2007 12:45:08 -0700

Have you done a vacuum on the table recently?

I would be curious to see how:

select stuff from table 
where index_key = <key1> AND non_index_row in ('xyz','abc','def')
UNION ALL
select stuff from table 
where index_key = <key2> AND non_index_row in ('xyz','abc','def')
...
UNION ALL
select stuff from table 
where index_key = <key600> AND non_index_row in ('xyz','abc','def')

performs by comparison.  If, after a vacuum, it performs better than the
IN list, then the IN list might benefit from a bit of analysis for
better tuning chances.

> -----Original Message-----
> From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-
> owner@xxxxxxxxxxxxxx] On Behalf Of Stephen Harris
> Sent: Wednesday, May 02, 2007 11:32 AM
> To: Postgres General
> Subject: [GENERAL] Have I b0rked something? Slow comparisons on "where
x
> in (...)"
> 
> Postgres version 8.0.9 on Solaris 2.8.  I know it's old but...
> 
> I have a table with a million rows.
> 
> I need to select data from this table based on an indexed column; I
need
> to select 600 possible values from the column, returning around 24,000
> rows of data.
> 
> In perl I have a hash which has 600 key values in it.
> 
> So I did:
> 
>   "select stuff from table where index_key in (" .
>       join(",",keys %hash) . ") AND non_index_row in
('xyz','abc','def')
> 
> And in the perl
>   while(fetch())
>   {
>     do_stuff
>   }
> 
> This resulted in a query string which executed in 12 minutes.  If I
> did an "explain" on the query string then I can see it was being
expanded
> to 600 OR statements
>   where (index_key = 1) OR (index_key = 2) OR .....
> 
> 
> Now as an alternative option I did
>   select stuff from table where non_index_row in ('xyz','abc','def')
> and in the perl
>   while(fetch())
>   {
>     next unless $hash{$_->{index_key}};
>     do_stuff
>   }
> 
> To me this should be slower since we're selecting more rows, throwing
> the data back to the perl and then discarding values I didn't want.
> 
> Imagine my surprise when the result took 3 minutes to execute.
> 
> Have I broken something, somewhere?  Or are IN comparisons really that
> slow?
> 
> For what it's worth, a simple
> 
>   explain select count(*) from table where index_key in (1,2,3,4,....)
> 
> uses the index up until 156 values but then switches to sequential
scan
> when there are 157 or more values in query.
> 
> Any thoughts?  I fear my poor tuning attempts may have caused other
> slow downs!
> 
> --
> 
> rgds
> Stephen
> 
> ---------------------------(end of
broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that
your
>        message can get through to the mailing list cleanly