Re: When does PostgreSQL collapse subqueries to join?

Tom Lane <tgl@xxxxxxxxxxxxx> · Wed, 10 Dec 2014 10:34:49 -0500

"Sven R. Kunze" <srkunze@xxxxxxxxxxxx> writes:
> ################ Perfect Plan ###############
> We assume all our queries to be equivalent and therefore want PostgreSQL 
> to re-plan the others to this one.

> explain analyze verbose select * from a where a.id in (select a.id from 
> a inner join text_b b1 on (a.id=b1.a_id) inner join text_b b2 on 
> (a.id=b2.a_id) where b1.x='x1' and b1.y='y1' and b2.x='x2' and b2.y='y2' 
> order by a.date desc limit 20);

> [ ... other variant cases ... ]

> ################### Slow Subqueries ##########################
> Directly querying from the subqueries performs even worse.

> explain analyze verbose select * from a where a.id in (select 
> text_b.a_id from text_b where text_b.x='x1' and text_b.y='y1') and a.id 
> in (select text_b.a_id from text_b where text_b.x='x2' and 
> text_b.y='y2') order by a.date desc limit 20;

> What needs to be done in order to feed PostgreSQL with the last query 
> and achieve the performance of the first one?

Postgres will *never* turn the last query into the first one, because
they are not in fact equivalent.  Putting the ORDER BY/LIMIT inside the
subquery has entirely different effects than putting it outside.  There's
no guarantee at all that the first query returns only 20 rows, nor that
the returned rows are in any particular order.

I'm a bit suspicious of the other aspect of your manual transformation
here too: in general semijoins (IN joins) don't commute with inner joins.
It's possible that it's okay here given the specific forms of the join
clauses, but the planner won't assume that.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance