Re: hash semi join caused by "IN (select ...)"

Dave Johansen <davejohansen@xxxxxxxxx> · Wed, 18 May 2011 07:00:50 -0700

On Wed, May 18, 2011 at 1:46 AM, Clemens Eisserer <linuxhippy@xxxxxxxxx> wrote:

Hi,

Does anybody know why the planner treats "= ANY(ARRAY(select ...))"

differently than "IN(select ...)"?

Which one is preferable, when I already have a lot of joins?

Thanks, Clemens

2011/5/17 Clemens Eisserer <linuxhippy@xxxxxxxxx>:

> Hi,

>

>>> select .... from t1 left join t2 .... WHERE id IN (select ....)

>>

>> Does it work as expected with one less join?  If so, try increasing

>> join_collapse_limit ...

>

> That did the trick - thanks a lot. I only had to increase

> join_collapse_limit a bit and now get an almost perfect plan.

> Instead of hash-joining all the data, the planner generates

> nested-loop-joins with index only on the few rows I fetch.

>

> Using = ANY(array(select... )) also seems to work, I wonder which one

> works better. Does ANY(ARRAY(...)) force the optimizer to plan the

> subquery seperated from the main query?

>

> Thanks, Clemens

>

--

Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-performance

I'm just a user so I don't have definitive knowledge of this, but my experience seems to indicate that the = ANY(ARRAY(SELECT ...)) does the select and turns it into an array and then uses that in the where clause in a manner similar to a hard coded list of values, like IN (1, 2, 3, ...). In theory, the planner could do the same sort of things with the IN (SELECT ...) but my experience seems to indicate that in some cases it decides not to use an index that it could.

One specific example I know of is that at least in PostgreSQL 8.3, a view with a UNION/UNION ALL will push the = ANY(ARRAY(SELECT ...)) down into the two sub-queries, but the IN (SELECT ...) will be applied after the UNION ALL.

Dave